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Abstract 

Many heap-oriented languages such as Lisp and Id depend on run-time garbage collection to re- 
claim storage. Garbage collection can be a significant run-time expense, especially for functional 
languages that tend to allocate structures often. Compiler-directed storage reclamation reduces 
the run-time overhead of garbage collection by having the compiler insert deallocation code. Com- 
pilers must perform object lifetime analysis in order to insert storage reclamation code. Current 
approaches to lifetime analysis assume a strict or sequential interpreter. 

We formulate an operational semantics for a parallel, non-strict language in order to precisely 
define when it is safe to deallocate an object. Our operational semantics yields exact information 
about what objects are allocated, deallocated, and referenced at any point during the execution of 
a program. Using this information, we define precise run-time conditions that must be met by safe 
deallocation commands. 

We use abstract interpretation to yield at compile-time a summary of what objects are allocated 
and reachable at any point in a program. We define static conditions that must be met by safe 
deallocation commands. We then define an algorithm that uses the abstract interpreter to verify the 
safety of deallocation commands already in programs and an algorithm to insert safe deallocation 
commands into programs. 

We describe our implementation of the lifetime analysis, the verification algorithm, and the insertion 
algorithm. We the discuss the effectiveness of the compiler at verifying and inserting deallocation 
commands in several medium-sized Id programs. We also discuss the performance of each program 
in terms of storage allocated and reclaimed. Our implementation is quite effective for programs 
with simple patterns of sharing between objects. 
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Chapter 1 



Introduction 



Many modern programming languages are oriented towards dynamic storage management. Lisp- 
like languages and functional languages such as Id [35, 33, 34] are heap-oriented languages. In 
these languages the storage for arrays and other aggregate objects is not necessarily associated 
with the invocation of the particular procedure that allocated the object. Storage for an aggregate 
is allocated on a heap when the aggregate is created so objects can last longer than the procedure 
that created them. The storage in which an object resides can only be reclaimed when the program 
no longer uses the object, where a use of an object is a reference to the contents of the object. The 
lifetime of an object is the period of time from the object's allocation until the time when the last 
reference is made to the object. 

Storage in heap-oriented languages is often reclaimed implicitly, by a garbage collector. A garbage 
collector intermittently or incrementally traverses the heap, stacks, and other program data struc- 
tures to determine which objects are reachable by the program, and then clears all other objects 
(the garbage) from the heap. 

The standard alternative to garbage collection, explicit storage management, requires the pro- 
grammer to insert commands to reclaim storage so that the program does not run out of memory. 
Explicitly deallocating structures is often an extremely error-prone process, because it is not always 
clear where a structure is passed and when it is no longer referenced. Furthermore, changes to a 
program can cause deallocation commands to become incorrect. 

It is easier to develop correct programs when using implicitly managed storage, because the pro- 
grammer does not have to worry about storage management. Unfortunately, implicit storage man- 
agement is typically more expensive than explicit storage management. One source of overhead in 
implicit storage management is the determination of which storage is no longer in use. Another 
source of overhead is that garbage collected systems typically use more storage at any point in time 
than explicitly managed systems because they reserve a significant fraction of memory (up to half) 
for use solely during the garbage collection process. Furthermore, storage is usually not reclaimed 
as soon as it ceases to be used in a garbage collected system, and so more storage is allocated at 
any point in time than in an explicitly managed system. 

In this thesis, we present a third possibility: implicitly managed storage without all of the run-time 
overhead of garbage collector managed storage. We accomplish this by having the compiler analyze 
programs and insert storage deallocation commands, thus lifting the much of the burden of storage 
management from the programmer. The compiler will not be able to pick up all of the garbage, 
and so the rest will have to be handled by the programmer or by a run-time garbage collector. In 
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this thesis we develop and evaluate a method of static analysis of programs to determine object 
lifetimes. Our goal is to determine if this analysis is useful and to determine to what extent storage 
management can be done by the compiler. General issues of garbage collection and heap-allocation 
algorithms are orthogonal to this work. 

This thesis makes a number of contributions to the area of lifetime analysis and to the practice of 
static program analysis. First, it develops a general framework for the lifetime analysis of a parallel 
language using the theory of abstract interpretation. Second, it defines abstract representations for 
a variety of data types, including tuples, arrays, algebraic types, recursive types, and higher-order 
functions. Third, it attempts to characterize the costs and effectiveness of these techniques when 
applied to real programs. 

Past work in lifetime analysis has mostly been on sequential languages. We know of no work which 
performs lifetime analysis on either sequential or parallel non-strict languages. The lifetime analysis 
method described in this thesis applies to parallel, strongly-typed, single-assignment languages. 
Slight variations in the methods we use allow us to analyze either strict or non-strict programs. 
We present the work in terms of a non-strict language and discuss the changes necessary to apply 
our methods to a strict language. 

Our main goal was to develop a framework for lifetime analysis and to determine its effectiveness. 
In fact, we have developed a general framework for abstract interpretation of parallel and non- 
strict languages with a rich variety of types. This abstract interpreter could be used to perform 
interference analysis or even strictness analysis instead of lifetime analysis, although we have not 
pursued these topics. We do consider a limited form of sharing analysis to determine if the elements 
of arrays may be shared. We also discuss extensions to the lifetime analysis of recursive types that 
would allow us to determine whether objects form directed acyclic graphs or trees. 

We have found our implementation of these methods to be quite effective in determining the lifetimes 
of objects in real Id programs. We have implemented this work as part of the Id compiler [40] and 
applied it to several programs of 100 to 1000 lines. The implementation is structured to support 
separate compilation — a program can be compiled in bottom up fashion and each procedure is 
verified/transformed individually. The augmented compiler was able to compile these programs into 
object code that deallocated 80 to 100 percent of the total storage that they allocated, at a cost 
of a factor of 1.5 to 5 increase in compile-times. We have also found that although a programmer 
could insert all of the deallocation code that the compiler inserted, it would require a major change 
in programming style to do so. 

1.1 Thesis Overview 

Before we can develop a lifetime analysis algorithm, we must have a well-defined notion of the 
lifetime of an object. We defined the lifetime of an object to be the period of time during the 
execution of a program from when the object was allocated until the object was no longer referenced. 
Lifetime analysis is the process of determining the range of program points during which the object 
bound to a particular variable may be referenced by a program. Thus, lifetime analysis is intimately 
related to the operational semantics of a program. 

In this thesis, we first develop an operational semantics for a non-strict, parallel language. The 
operational semantics is defined by an interpreter that gives the standard semantics of a program 
in terms of its behavior. We then use this semantics to define the lifetimes of objects allocated 
by programs. Our definition of object lifetimes is exact, but can only be determined at run-time, 
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as a program is executing. We also use this interpreter to define when deallocation commands 
are correct. Correct deallocation commands never lead to run-time errors in which an object is 
deallocated before the end of its lifetime. 

Because we must be able to determine object lifetimes at compile-time, we need some way of 
converting this exact, run-time notion of object lifetimes into a compile-time property. We use an 
abstraction of the standard interpreter to give an approximation of the behavior of the program. 
The purpose of the abstract interpreter is to generate an approximation of a program's behavior 
over all input data, and, in the case of parallel programs, over all execution orders. In addition, the 
abstract interpreter must be decidable — we must be able to compute this approximate behavior 
in a finite amount of time. 

Given the abstract interpretation of a program, we show how to compute approximate object 
lifetimes. We are willing to use approximate object lifetimes in order to develop an algorithm that 
terminates, as long as the approximations are all safe. A safe approximation of an object's lifetime 
is guaranteed to include the actual lifetime of an instance of that object at run-time. 

We present two algorithms that use the information about object lifetimes. The first algorithm 
verifies that the deallocation commands in a program are all safe. The second algorithm inserts safe 
deallocation commands into programs automatically. This second algorithm allows the programmer 
to write programs in which storage is implicitly reclaimed. 

For the remainder of this thesis, we talk about the language KID - [3], a specific parallel, non-strict, 
single-assignment language with higher-order functions. KID, or Kernel Id, is an intermediate 
language developed by Ariola and Arvind to express the semantics of Id [35, 33, 34] and to express 
the compilation of Id programs. In this thesis, we consider KID - to be KID without higher-order 
functions and M-structures [7] (structures with per-element mutual exclusion). 

In Chapter 2 we present the syntax and standard semantics of KID - and we discuss the unusual 
evaluation strategy used by the KID - interpreter. In Chapter 3 we develop an augmented, or 
instrumented , interpreter that allows us to define exactly when deallocation commands are correct 
and incorrect. 

In Chapters 4 and 5 we restrict KID - programs to operate only on tuples, numbers, and booleans. 
In the first of these chapters we develop an abstracted interpreter for KID - and show that it is safe 
with respect to the instrumented interpreter. Our definition of safety is that object reachability 
must be preserved by the abstract interpreter. In the second of these chapters we use the abstracted 
interpreter to give an algorithm for verifying safe deallocation commands and an algorithm for 
inserting deallocation commands. In Chapter 6 we discuss a method for improving the effectiveness 
of the lifetime analysis by improving the abstract interpreter. 

In Chapters 7, 8, and 9 we describe the additions to the abstract interpreter necessary to handle 
arrays, algebraic types, recursive types, and higher-order functions. 

In Chapter 10, we describe our implementation of the deallocation command verification and in- 
sertion algorithms and their effectiveness on several programs. Finally, we give our conclusions on 
this work in Chapter 11. 

The remainder of this chapter gives some background on lifetime analysis and storage management. 
Section 1.2 describes previous work relating to lifetime analysis. Section 1.3 describes the assump- 
tions we make about storage management. Section 1.4 compares the cost of garbage collection 
with the cost of explicit storage management. Section 1.5 describes explicit storage management in 
Id programs. Finally, Section 1.6 describes the safety condition that must be met by deallocation 
commands in Id programs. 
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1.2 Background 

This section gives some background on the problem of storage management. We start by describing 
the various storage management strategies, and then we go into more detail about the techniques 
used in implicit storage management. There are several ways in which each technique could be 
classified, and so the division of techniques into groups is somewhat arbitrary. 

The problem of storage management has existed since the first computer program was written. In 
early programming languages such as Fortran, storage is statically allocated by the programmer. 
Under the static management paradigm, the programmer or compiler allocates storage for all 
structures by creating a memory map that places each object in a fixed position. In early Fortran 
implementations, all procedure activations and all data structures were statically allocated. In 
modern computer languages, some data structures may be statically allocated. There is no direct 
run-time cost for the management of statically allocated storage — this all occurs at compile-time 
when the memory map is constructed. 

Static allocation is not always possible: the activation frames for recursive procedures cannot be 
statically allocated. A separate activation frame must be allocated for each recursive procedure 
invocation. For this reason, procedure activation frames, including storage for procedure-local 
objects, are usually stack managed. Temporary structures, declared locally in procedures, may 
also be stack allocated. 

Under stack management, storage is managed by having a pointer to the next word to be allocated, 
incrementing this pointer to allocate storage and decrementing this pointer to deallocate storage. 
Under stack discipline, objects must be deallocated in the reverse order from which they were 
allocated (last-in-first-out). Stack management allows greater flexibility than static allocation, 
because the number and size of objects does not have to be known at compile-time. 

Objects allocated on the same stack as activation frames are automatically deallocated when the 
procedure that allocated them returns to its parent. If a pointer to this object is returned to 
the parent procedure, the parent may attempt to refer to the contents of a defunct structure. 
This scenario is known as the dangling pointer problem. The danger is that the object may be 
overwritten when another procedure call is made, and that the parent will thereafter read spurious 
data. 

Sometimes it is necessary for objects to survive longer than the procedures that allocated them. 
In this case, they must be handled by a heap management algorithm that allows objects to be 
allocated and deallocated in an arbitrary order. Heap management increases the expressiveness of 
a language but complicates the storage manager by making it more expensive computationally. The 
heap manager must keep track of which storage is in use and which storage is free to be allocated, 
while trying to minimize wasted storage due to mismatches between the sizes of objects requested 
and the sizes of objects actually allocated. 

In many languages that support heap allocation, such as C, both allocation and deallocation must 
be specified by the programmer. If the programmer does not deallocate structures that are no 
longer needed, then the program consumes an inordinate amount of memory, possibly causing the 
program to fail. If the programmer deallocates an object too soon, then the program may behave 
incorrectly due to a dangling pointer error. 

The explicit deallocation of structures whose lifetime is not tied to that of a procedure invocation 
is difficult — the programmer must ensure that no more references to an object are made anywhere 
in the program. The difficulty is increased if the pattern of sharing among objects is complex. 
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For this reason, many heap-oriented languages have run-time system support to automatically (or 
implicitly) deallocate storage that is no longer in use. 

In heap-oriented languages such as Lisp, storage management is implicit: all structures allocated 
are typically allocated on a heap, and the run-time system automatically takes care of deallocating 
structures that are no longer accessible from the program. Structures that are not accessible from 
the program are called garbage. The garbage collector, a part of the run-time system, periodically 
scans the heap and invocation stacks, and finds and reclaims all unreachable objects. In general, 
garbage collection is more expensive than explicit heap management; in addition to reclaiming 
storage the garbage collector must determine which objects are garbage. The benefit from using 
a garbage collected system is that the user does not have to worry about not deallocating enough 
storage or about deallocating storage too early. 

1.2.1 Lifetime Analysis 

Lifetime analysis was first suggested by Barth [5] as an optimization to shift some of the run-time 
overhead of garbage collection to compile-time. His approach is to take Lisp programs that had 
reference counting code inserted, and to use dataflow (live variable) analysis to determine that 
a particular variable in the program will always be associated at run-time with a structure with 
reference count 1. When a variable is determined to be dead, then code can be inserted to free 
the associated structure. Barth also discusses several local transformations that optimize reference 
counting code inserted by the compiler. Although his method only inserts deallocation code if it 
determines that there is exactly one reference to a structure, he claims that this optimization is 
powerful enough to reclaim a significant amount of temporary storage in Lisp programs, because 
studies by Clark [11] show that most structures in Lisp programs are referred to exactly once. 

1.2.2 Dataflow Analysis 

Barth's method was limited because the analysis could not follow pointers or procedure calls. There 
have been several approaches that attempt to solve these problems. 

Ruggieri and Murtagh [39] developed an interprocedural lifetime analysis framework for a statically 
typed, monomorphic language. Their algorithm computes the set of object sources which may be 
bound to each variable before each statement in the program is executed. They represent nested 
objects as subvariables, with labeled edges connecting variables with the contents of their various 
fields. Recursively typed objects have a potentially infinite number of subvariables; so Ruggieri and 
Murtagh introduce an operator that summarizes an infinite graph of subvariables by one in which 
the longest path is bounded by n, where n is a parameter of the analysis. 

Larus and Hilfinger [29] developed an analysis similar to Ruggieri's which computes the possible 
aliases between structure accesses. They show how to use standard dataflow techniques to compute 
their alias graphs. They also show that precise computation of alias relations in a single function 
is NP-complete. 

Hendren and Nicolau [20] take a different approach to solving the finite representation problem. 
They define an analysis framework that uses path matrices to do interference analysis for par- 
allelization. These path matrices show the paths of possible interference between two successive 
program points. Each element of a path matrix uses a regular expression of field names to name an 
access path through a recursively typed object. This naming scheme guarantees that access paths 
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are of finite size. Hendren and Nicolau's method automatically detects non-shared lists and trees 
in an imperative language. The interference analysis Hendren and Nicolau developed can be recast 
as a lifetime analysis by determining all the statements from which a given structure is reachable 
— the control region bounded by those statements bounds the lifetime of the structure. 

Chase, Wegman and Zadeck [10] attempt to improve the method by which information about data 
structures is summarized. Their method takes programs in static single assignment form [14] and 
constructs a storage shape graph (SSG) that represents the interconnectedness of structures in the 
heap. Each node in the graph represents a structure allocated by a different allocation statement. 
The number of nodes in an SSG is bounded by the sum of the number of allocation statements and 
the number of variables in a program. Storage shape graphs are augmented with heap reference 
counting to determine the lifetime of a structure and to determine if a structure is acyclic. 

1.2.3 Analyses Based on Abstract Interpretation 

These techniques all consist of a set of ad hoc rules for analyzing programs. Cousot and Cousot [12] 
developed abstract interpretation, a method for simulating the execution of a program in order to 
determine the behavior of a program. The use of abstract interpretation allows the derivation of 
an analysis framework from the operational semantics of a programming language. 

Jones and Muchnik [27] used abstract interpretation to develop a general framework for interpro- 
cedural dataflow analysis of programs with recursive data structures. They extend the Cousots' 
work on dataflow analysis of flowcharts to work with recursive data structures. They use tokens 
to provide local representations of lists. Tokens are labels derived from program states. Their flow 
analyzer constructs a retrieval function that takes a token and reconstructs the list or lists locally 
described by that token. This retrieval function is really an abstraction of a store, where a store 
maps locations to list values. 

Jones and Muchnik describe a version that analyzes a simple first-order language. This version uses 
node labels as tokens, and divides tokens into atoms and lists. Their general framework could be 
adapted to a variety of analyses by plugging in the appropriate domains and operational semantics. 
There is a great deal of freedom in choosing tokens. Tokens can be more specific, e.g., whole states, 
in which case the analysis will be more precise but computationally intractable, or more general, 
e.g., node labels, in which case the analysis will converge faster but give less precise information. 

Horwitz, Pfeiffer, and Reps [22] use the Jones and Muchnik framework to compute an abstraction 
of memory where each location is labeled by the program points that modify its contents. They 
show that their analysis is correct for all implementations of the underlying operational semantics. 
The framework of Horwitz et al does not do interprocedural analysis. 

One enhancement to this framework is the ability to handle higher-order functions. Deutsch [15] 
develops a static analysis method for determining the aliasing and lifetimes of objects in a strict, 
higher-order functional language with first class continuations. His work is also based on that of 
Jones and Muchnik. Deutsch presents a low-level operational semantics defined in terms of state 
transition rules, and abstracts this semantics to obtain an analysis algorithm. He uses complete 
program states to label objects uniquely in the standard semantics and uses an abstraction of 
program states to label objects in the abstract semantics. 

Rather than presenting a low-level operational semantics, Harrison [19] presents an analysis in 
terms of a high-level operational semantics for Scheme. Harrison develops an analysis that could 
be used to make storage management and parallelization decisions about Scheme programs with 
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first class continuations, side effects and higher-order functions. His work takes an approach similar 
to that of Jones and Muchnik. The correct modeling of control flow in the presence of continuations 
adds to the complexity of Harrison's method. He uses procedure strings to name all points in the 
execution of a program. A procedure string consists of a sequence of symbols naming the procedure 
bodies that have been entered and exited along the execution path to a program point. Harrison 
models aggregate objects as higher-order functions. 

1.2.4 Analysis of Parallel Languages 

All of these techniques were developed for sequential programming languages, even though the orig- 
inal work on abstract interpretation was defined in terms of flow graphs, which are not necessarily 
sequential. Much of the work on abstract interpretation has been done on functional languages, 
which are often touted as being parallel languages. Even so, most of the work on lifetime analysis of 
functional languages has been done with respect to a sequential implementation. There have been 
a few approaches that do not assume a sequential implementation, which we will describe below. 

Hudak [23] describes an analysis based on abstract interpretation of a reference counting interpreter 
for a strict, functional language operating on arrays of numbers. Even though the language is 
functional, the denotational semantics he presents is sequential, because it performs side effects in 
the form of reference counting operations. 

Thomas Johnsson [26] developed an analysis method for modeling heap contents based on the 
framework of Jones and Muchnik. His analysis is to be used in optimizing graph reduction in- 
termediate code that resulted from compiling a lazy, functional language. Although the language 
being compiled is not sequential, the interpreter of the intermediate code is sequential. The in- 
termediate code is imperative and contains explicit code to construct and evaluate closures. The 
parallelism in the source language is simulated by interleaving execution of subexpressions in the 
intermediate code. 

Ranelletti [38] describes an analysis method on dataflow graphs representing parallel programs 
written in SISAL [16]. These dataflow graphs only give a partial order on the execution order of 
expressions in a program. This method allows the compiler to transform graphs so that storage 
is preallocated for arrays that are incrementally defined by a program. Preallocation reduces the 
number of arrays that need to be allocated and reduces the number of times array elements are 
copied from one array to another. Ranelletti's method is very efficient — it takes 0(n) compile- 
time, where n is the size of the program being analyzed. Unfortunately, extending it to handle 
interprocedural analysis will make it much less efficient — it will take 0(2 n ) compile-time. 

Cann [9] describes an analysis technique on SISAL dataflow graphs that allows arrays or array dope- 
vectors to be updated in place whenever it can be shown that the updater is the only consumer of 
the array. This method is also based on parallel programs. However, some of his transformation 
techniques add dependence edges that increase the sequentiality of the program in order to perform 
update-in-place optimizations. 

In addition to these graph-based approaches, there have been a number of abstract interpretation- 
based analysis frameworks that are interesting because they also do not assume a sequential inter- 
preter. The work by Young and O'Keefe [45] and the work by Aiken and Murphy [1] fall into this 
category. 

Young and O'Keefe developed a type evaluator for a lazy dialect of Scheme. This evaluator computes 
an approximation to the set of possible values to which each expression in a program could evaluate. 
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The only data structure that they considered was untyped pairs. In order to analyze recursive 
functions on lists, their analyzer approximated infinite sets of values by cyclic type representations. 
Although the evaluator described by Young and O'Keefe yields an approximation of the values 
computed by each expression in a program, it is not viable for use in lifetime analysis because there 
is no way to determine the sharing or reachability of objects from any expression in the program. 

Aiken and Murphy developed a similar type inferencer for the strict functional language FL. Their 
approach uses type expressions as the abstract value domain and a set of rewrite rules to give the 
operational semantics of FL. The language of type expressions includes a fix operator that defines 
an infinite set of regular tree types by a finite representation. These recursive type expressions are 
used when deriving the type of recursive functions. 

Aiken and Murphy's type inferencer uses the rewrite rules as constraints in a proof system to derive 
the types of FL expressions. In the case of recursive functions, heuristics must be used to choose 
which rewrite rule to apply, because more than one rewrite rule may be applicable to a given 
instance of a recursive function. 

Park and Goldberg [37] developed an analysis framework based on abstract interpretation of a 
higher-order functional language. Their framework computes an approximation of how much of a 
nested list value passed to a function escapes as part of the result of that function. They did not 
precisely define the standard semantics that they were abstracting. 

Jones and Le Metayer [28] developed three analyses framed as abstract interpretations of programs: 
sharing, transmission, and necessity analysis. These analyses are defined for an expression-oriented 
language with lists as the only data structure. Jones and Le Metayer did not state precisely the 
standard semantics corresponding to the abstract semantics used in the analyses, and so it is 
difficult to see how to generalize this method to other data structures. 

There are two problems with the last two approaches to determining object lifetimes. The first is 
that they do not have a good correspondence with any standard semantics. The point of methods 
based on abstract interpretation is that the analyses can be shown to be safe with respect to the 
standard semantics. The second problem is that objects are not named, and so the analyses fail if 
the source languages are made imperative or non-strict, because there is no way to handle cyclic 
structures in these frameworks. 

1.2.5 Analyses Based on Type Deduction 

There is one more semantics-based approach to analysis that defines the analysis in terms of type 
deduction or type checking using a non-standard type system. Lucassen and Gifford [31] define a 
type and effects system for the FX language [17] that can be used to determine the lifetimes of 
objects. FX-87, based on the second-order lamb da- calculus, has a kind system consisting of type 
and effect annotations. The effect annotations describe which regions are allocated into, written 
to, or read from during the execution of an expression. Effect annotations on procedure values 
describe not only the effects incurred by evaluating the procedure value, but also the latent effects 
incurred by applying the procedure value to arguments. Lucassen and Gifford show how the effect 
descriptions can show that the lifetime of an object resulting from a particular expression has 
limited extent. 

This approach requires the user to annotate programs with type and effect declarations before the 
compiler can perform type and effect checking and lifetime analysis. Use of this approach would 
also allow the compiler to check the safety of explicit storage management in some cases. In later 
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work Gilford et al. [18] extend the FX compiler to perform type and effect deduction, but in this 
effect system they dropped the information about storage regions. It is unclear from the paper 
whether there is an efficient or decidable algorithm for deducing types and effects with regions for 
FX programs. 

Baker [4] describes states that the structure-sharing unification algorithm for Milner-style type 
inference already produce a certain amount of sharing information for functional languages. Each 
node that represents the type of an expression in a program corresponds to a set of run-time 
objects. In a functional language, distinct type nodes represent disjoint sets of run-time objects, 
while unified type nodes represent overlapping sets of run-time objects. The advantage of using 
type inference for sharing analysis is that the algorithms for type inference are efficient enough to 
be used in production compilers. The disadvantage of this approach is that it cannot be extended 
to imperative programming languages without greatly increasing the complexity of the analysis. 

1.3 Storage Management Assumptions 

Let us assume that objects allocated by a program can be placed either in the activation frame of 
the procedure that allocated the object or on an implicitly or explicitly managed heap. Objects 
placed in procedure activation frames are automatically deallocated when the procedure terminates; 
consequently, the lifetime of these objects must be bounded by the lifetime of the procedure. In 
our implementation of Id, only fixed size objects may be frame-allocated because a procedure's 
activation frame cannot be extended once it has been allocated. 

We believe that the applications in which we are interested would suffer too much of a performance 
penalty if they depended solely on run-time garbage collection. One characteristic of these appli- 
cations is the use of large amounts of data, often held in large arrays. The behavior of garbage 
collectors in the presence of large, shallow or flat data structures is not well understood, but ap- 
plications typically manage these structures explicitly even though garbage collection is used to 
manage other structures. In these programs, most storage reclamation should be done explicitly, 
either by explicit deallocation or explicit reuse of structures. We would like to automate the process 
of explicitly managing these large structures. It is very difficult, and often impossible, for either a 
programmer or a compiler to explicitly reclaim all structures allocated, and so we will continue to 
have a garbage collector that reclaims the storage that cannot be reclaimed explicitly. 

This thesis does not explore the best ways for the heap manager and garbage collector to interact. 
The way explicit and implicit storage management interact depends to a large extent on the choice 
of garbage collection method and characteristics of the run-time system. One possibility is for the 
heap manager to allocate areas that are never garbage collected, and to use these areas for objects 
that are guaranteed to be deallocated eventually. The objects in these areas would never have to 
be copied by the garbage collector, and so we would save on the overhead of copying these objects. 
Another possibility is to use a reference counted garbage collector and to set the reference counts 
of objects whose lifetimes can be determined to one upon allocation and to zero when no longer 
needed, but not to perform reference counting operations on the objects otherwise. 

1.4 Cost of Storage Management 

It is not clear that a program will always have better performance running under an explicit storage 
manager than it will have running under a garbage collector. Appel [2] makes an argument that 
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garbage collection can be faster than explicit storage management; he claims that it can even be 
faster than stack allocation. Appel's claim is that 

with enough memory on the computer, it is more expensive to explicitly free a cell than 
it is to leave it for the garbage collector — even if the cost of freeing a cell is only a 
single machine instruction. 

Appel gives the cost per reachable object of copying garbage collection as 

g = (Cl | C2S)A (1.1) 

J M/s-A K ' 

where c\ is the number of operations required per object copied, ci is the number of operations 
per pointer, s is the average size of an object, A is the number of reachable objects when garbage 
collection is performed, and M is the size of the two memory spaces. If M is made sufficiently large 
relative to the other parameters, then the cost per reachable, or non-garbage, object can be made 
arbitrarily small. 

In the limiting case as the amount available memory approaches infinity, Appel asserts that it is 
cheaper to rely on garbage collection than explicit storage management, even stack management, 
because the garbage collector will never have to run. At the other extreme, as the amount of memory 
approaches the average amount of memory in use at any time, the cost of garbage collection goes to 
infinity. In order to determine the crossover point where the cost of implicit memory management 
is less than the cost of explicit memory management, we must know the average amount of memory 
used by a program and the time constants c\ and ci associate with garbage collection, relative to 
the cost of explicit storage management. 

Is it reasonable to assume, as Appel does, that we will be operating in the large-memory regime 
where the cost of garbage collection is insignificant? Although the cost per word of memory 
is continuously decreasing, the amount of memory needed for interesting problems seems to be 
increasing just as fast. It seems that the cost of garbage collection will be significant for the class of 
programs considered in this thesis because large programs will operate in the memory management 
regime where most of memory is in use and garbage collection is expensive. Nevertheless, the cost 
of explicitly allocating and deallocating an object by a general heap manager is very high, and so 
care must be taken to reduce the number of calls to the general heap manager. For this reason, 
we will consider some approaches to reusing storage directly or allocating objects in procedure 
activation frames. 

Appel does not consider the effect of locality on program execution time. Moon [32] states that 
the most important responsibility of a garbage collector in a system using virtual-memory is to 
keep data structures local; actually reclaiming storage is a secondary responsibility in this case. If 
a program has little locality of reference because it uses objects spread over a very large amount 
of memory, then the performance of the program will be very poor if the virtual-memory system 
thrashes. 

Is there some way for explicit storage management to cooperate with garbage collection? Many of 
the strict, functional languages use a reference counting garbage collector because these languages 
cannot create cyclic data structures. If a reference counting garbage collector is used, then reference 
counting of objects whose lifetime is known need not be performed. The reference count will be 
set to one when the object is created and set to zero when the object's lifetime is over. The Id 
run time system is likely to use a copying garbage collector, so that it can reclaim circular objects. 
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def f = 

{ x = MakeTuple(6,847); 
r = Selecti(x) ; 

Dealloc(x) 
in r } 



Figure 1.1: Dealloc in a block expression 



A explicit storage allocation and deallocation can cooperate with a copying garbage collector by 
allocating and deallocating objects in a separate region. Garbage collection will not be performed 
on this region until all objects within it are garbage, in which case, the region may be used as free 
storage. Alternatively, this region can be treated as an older generation, and garbage collected 
infrequently, with promotion suppressed. 

1.5 Explicit Storage Deallocation in Id 

The first step in this work was to allow programmers to perform explicit storage management in 
Id. We introduced an experimental feature into the language for explicit deallocation of structures. 

The Dealloc primitive, along with (local barrier synchronization) allows Id programmers to 

insert commands that deallocate the storage associated with an object when that object is no longer 
in use. 

Programmer-directed deallocation will be performed to determine the costs and benefits of explicit 
deallocation in terms of program performance and the problem sizes that may be run without 
exhausting memory or invoking the garbage collector. 

The Dealloc primitive explicitly deallocates the storage associated with a structure in Id. In order 
to use the Dealloc primitive, we must have proper synchronization that prevents the Dealloc from 
executing until all uses of the structure to be deallocated have executed. For that reason, we have 
also introduced a barrier synchronization construct, denoted by three or more dashes: . 

In Id, unlike other parallel languages, a barrier is a local synchronization. A barrier can only appear 
within a letrec block, and its effects are limited to that letrec block. A barrier in Id ensures 
that the code in the block bindings before the barrier executes to termination before the code in 
the block bindings after the barrier. We will define a control region to be the program region 
containing a group of block bindings delimited by barriers. In Id, a control region terminates when 
all computation threads have exited the control region. In other words, all values in the region 
have been produced and all side effects have been performed. 

The example in Figure 1.1 contains a block with one control region consisting of the bindings of x 
and r. In this example, the object to which x is bound will be deallocated when the computation 
in both bindings in the control region have terminated. 

Invocation and termination of control regions are partially ordered. Invocation is the point in time 
when the interpreter first begins executing a portion of a control region, and termination is the 
point in time at which the interpreter finishes executing all code in a control region. Naturally, 
termination of any control region always occurs after invocation of that control region. 
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def f () 
{ 



x = MakeTuple(6,847 
r = 



y = 
z = 

rl 



Selecti(x); 
Select2(x); 
= y + z; 



in rl } 



Dealloc(x) 



in r } 



cr 2 



cr\ 



Figure 1.2: Statically nested control regions 



Control regions may be composed by enclosing one control region within another or by placing a 
barrier between two control regions. If control region cro statically encloses control region cr\, then 
the invocation of cro must precede the invocation of region cr\ and the termination of cro must 
follow the termination of cr\. 

Definition 1.1 (Barrier Relation) The relation (cro cr\) holds if control region cro is stat- 
ically separated from cr\ by a barrier and cro comes before the barrier textually. 



If control region cro is separated statically from control region cr\ by a barrier, and cro comes 
before cr\, then both the invocation and termination of cro must precede the invocation of cr\. 

Consider the body of procedure /o in Figure 1.2. In this example there are three control regions: 
region cro which is composed of the bindings of x and r, region cr\ which is composed of the 
bindings of y and z, and region cr2 which is composed of the deallocation command. Region cro 
encloses region cr\] therefore, the invocation of cro precedes that of cr\, and termination of cr\ 
precedes that of cro- Region cro is separated from region cr2 by a barrier, and so both the invocation 
and termination of cro must also precede the invocation of cr2. The control region composition 
relations are transitive; therefore, the invocation and termination of region cr\, enclosed by cro, 
must precede the invocation of region cr2. 

The ordering of the invocation and termination of dynamically composed control regions follows 
from that of statically composed control regions. If control region cro contains a procedure call, 
and cr\ is the control region of the run-time instance of the body of that procedure call, then we 
say that cro dynamically encloses cr\. Therefore, the invocation of cro will precede the invocation 
of cr\ and the termination of cro will follow the termination of cr\. 

The example in Figure 1.3 is similar to Figure 1.2, except that control region cr\ is in the body 
of procedure g. In this example, control region cr\ is dynamically enclosed within control region 
cro because procedure g is called from within control region cro- Therefore, the partial ordering of 
invocation and termination of control regions will be the same as in the previous example. Clearly, 
we must be able to name dynamic instances of control regions if we are going to be able to talk about 
the ordering of invocation and termination of those regions. This naming of dynamic instances is 
one of the topics we will discuss in more detail later in this thesis. 
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def fo() 



{ 


x = MakeTuple(6,847); 




r = g(x); 











Dealloc(x) 




in r } 


def g( 


x) = 




{ 


y = Selecti(x); 
z = Select2(x); 






rl = y + z; 





in rl } 



Figure 1.3: Dynamically nested control regions 



If one control region statically or dynamically encloses another, then the lifetime of the outer region 
will completely include the lifetime of the inner region. On the other hand, if two control regions 
are separated by a barrier, then the lifetime of the first will completely precede the lifetime of the 
second control region. 
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From these two properties we determine that control regions form a natural tree. 
Definition 1.2 (Ancestor Relation) Control region cro is an ancestor of region cr\, or 

cr = cr 1 | 
if ct-q statically or dynamically encloses region cr\ . 

In addition, we say that control region cr\ is a descendent of region cro if cro is an ancestor of 
region cr\. Every control region cro will be considered to be an ancestor and a descendent of itself. 

cr = cr J 

The expression cro \ n ■, where n > 0, refers to the rath ancestor of region cro- 
We will now define two precedence relations on control regions. 

Definition 1.3 (Invocation Precedence) The invocation precedence relation 

(cro ^I cr i) is defined as follows: 

3n. cro = cr i ] n 
cr ^I cri = | V 

3ra ,rai. cr \ n ° cri \ ni 

If (cro ^I cri), then control region cro must be invoked before cri may be invoked. 

Definition 1.4 (Termination Precedence) The termination precedence relation (cro ^T cri) 
is defined as follows: 

3n. cro ] n = cr\ 
cr ^T cri = | V 

3rao,rai. cri ] n ° cro ] ni 

If (cro ^T cri), then control region cro must terminate before cri may terminate. 



1.6 Safety of Explicit Deallocation 

Note that the use of the Dealloc primitive is inherently unsafe. The programmer may try to 
deallocate structures that are shared between various parts of the program, causing all sorts of 
errors to occur. Just as in other languages with explicit allocation and deallocation, Dealloc 
introduces the possibility of dereferencing dangling pointers. Therefore, the programmer must 
analyze his program to verify the safety of each explicit deallocation performed. In this section, 
we will see a set of informal conditions that must be met in order to safely deallocate storage in a 
program. 

Conceptually, there is a single condition that must be satisfied in order to safely deallocate or reuse 
an object. An object may be deallocated when there are no further references to the object. In an 
implicitly managed system, an object will be deallocated when there are no live references to the 
object. A live reference to a data structure is a reference, or pointer, that is stored in either the 
activation frame of a procedure invocation, or in a static variable or in a data structure to which 
there is a five reference. In a system in which the programmer must explicitly manage storage, an 
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object may be deallocated as soon it can be guaranteed that no further use will be made of the 
object, and so the lifetime of objects may be shorter than in a system with garbage collection. 

One way to guarantee that there will be no further references to an object is to put the deallocation 
command in a control region or control region that executes after all uses of the object execute. 
Here is a condition that describes when it is safe to deallocate an object. 

Condition 1.5 (Deallocation Safety) Given two control regions cr$ and cr\ and a variable x 
and structure ol bound to x in both regions, it is safe to deallocate the structure ol in control region 
cr x if 

1. \/cr r . UsedIn(ol,cr r ) =^ 3n. cr$ = cr r \ a 

2. cr cr\, 

3. the only use of ol in region cr\ is in the deallocation of ol, and 
4- the only deallocation of ol is in region cr\. 

where Usedln (ol , cr) is true if object ol is allocated or dereferenced in control region cr. 

The first two subconditions of Condition 1.5 guarantees that all uses of the structure bound to 
variable x in control region cro have terminated before the Dealloc in control region cr\ can 
execute. The third subcondition guarantees that there are no references to the contents of x in cr\ 
that may execute after x is deallocated, and the fourth subcondition guarantees that the structure 
bound to x is deallocated only once. 

Condition 1.5 is sufficient to ensure the safety of the deallocation of structure x. This condition 
is very conservative; it means that the control region containing the producer and all the control 
regions containing consumers of the structure have terminated before the deallocation statement 
executes. 

Figures 1.4 and 1.5 show procedures from the Gamteb [8] photon transport simulation benchmark. 
The procedure compton, shown in Figure 1.4, contains a Dealloc command that satisfies Condi- 
tion 1.5. The variable new_particle is bound to a newly allocated tuple in the first binding in 
the body of procedure compton; the structure to which new_particle is bound is deallocated in 
the control region after the barrier. Note that procedure transport_particle uses new_particle 
but does not store it anywhere or return it as a value. Procedure compton allocates nine words of 
storage for the new particle. Adding the Dealloc statement allows that storage to be reclaimed as 
soon as compton terminates. 

The procedure handle_collision, shown in Figure 1.5, contains a binding of variable t_particle 
to a structure allocated in the body of procedure photo_elect and used in the body of compton. 
The control regions in which this structure is allocated and used are all descendents of the control 
region in which t_particle is bound. The deallocation of the structure bound to t_particle in 
the control region after the barrier is safe because the allocation and all other uses of this structure 
occurred either in the control region before the barrier or in descendents of that control regions, 
and so all of these uses must have terminated before the deallocation command is invoked. 

In the rest of this thesis, we show how to verify the safety of a deallocation command at run time. 
We show how to check the safety of deallocation commands at compile-time using a conservative 
approximation of when objects may be allocated, dereferenced, and deallocated. We also present 
an algorithm for inserting safe deallocation commands at compile-time. 
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def transport_particle xsect_table bins particle prob = 
{ x,y,z, u,v,w, wt, e, e_bin, cell, seed = particle; 
pcompton, ppair, pphoto, ptotal = prob; 

d_surf , surface = dist_to_surf ace x y z u v w; 
rand, randl = grand seed; 
d_coll = dist_to_collision ptotal rand; 
bin_counts = if (d_coll >= d_surf) then 
move_to_surf ace d_surf 
else % (d_coll < d_surf) 
handle_collision d_coll; 
in 

bin_counts}; 

defsubst compton particle d_coll xsect_table bins = 
{ %% Allocate a new particle, deallocate it in this context. 
new_particle = (new_x,new_y,new_z, ••• new_seed ); 

r = 
if e_kill then 

else 

(transport_particle xsect_table bins 

new_particle new_prob) 

Dealloc new_particle ; 
in r >; 



Figure 1.4: Procedure compton 
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defsubst handle_collision d_coll = 

{ %% tjparticle is allocated within photo_elect, 
%% and deallocated in handle collision. 
t_p article, absorb, wt_kill = 

photo_elect particle d_coll pphoto ptotal ; 
counts = 

if (not wt_kill) and (randl < p_compton) then 

compton t_particle d_coll xsect_table bins 
else 

r = add_counts counts col_counts ; 

Dealloc t_particle ; 
in r } 

Figure 1.5: Procedure handle_collision 
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Chapter 2 



Problem Statement 



In order for the compiler to verify or insert explicit storage deallocation code in programs, it must 
be able to determine the lifetimes of the objects being deallocated. Thus, the compiler must have 
some notion of the run-time behavior of the program being compiled. In this thesis, the compiler 
will use an abstraction of the operational semantics to determine the lifetimes of objects. 

This thesis develops a method for lifetime analysis that is directly applicable to parallel, single- 
assignment languages. In particular, we will be using the language KID - as the basis for the 
analysis. 

The first step in developing our lifetime analysis algorithm is to define a standard operational 
semantics for the language of interest. One can define the operational semantics in terms of an 
abstract machine, in terms of a term rewrite system, or in terms of an interpreter. We define the 
operational semantics in terms of an interpreter because that allows us to stay close to the original 
source code of the program, rather than compiling into object code for the abstract machine. 

This chapter describes KID - syntax and gives its semantics in terms of an interpreter. In the 
first section, we define the notation used throughout this thesis. In the second section, we define 
the syntax of KID - programs and the value domains over which KID - programs operate. In the 
third section, we define the standard KID - interpreter and give examples of its operations. In the 
fourth section, we define the deallocation problem in terms of the KID - interpreter, and in the 
fifth section we will give an overview of the development of our solution in the rest of this thesis. 



2.1 Notation 

We will adopt the convention of using double brackets, [[ e ], around program text. Environments 
will be represented by p, looking up variable x in environment p will be represented by p[x], 
and binding variable x to value v in environment p by p[v/x]. Stores will be represented by a, 
dereferencing a location / in store a will be written as cr[l], and binding location / to value v in store 
a will be written as a[l — ► v]. The expression V(X) indicates the powerset, or set of ah subsets, 
of X. Tuples will be written with angle-brackets and elements separated by commas: (v\, ■ ■ -,v n ). 
Tagged structures will be written with a subscript tag, (f 00 v\, ■ ■ -,v n ), and x.Tag will be used to 
refer to the tag of such a structure. The expression D±_ constructs a new domain that consists of 
the elements of domain D plus a new element _L which is less than all elements of D. 
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F 
X 

SE 

L 

tag G Tag 

OP 



E 



Bs 
Ds 
BE 

pr G Prog 



f |fi | ... 

Xo | Xi | X2 | . . . 

1 | 2 | 3 | True | False | X 

Iq I l\ | I2 I • • • 

{1,2,3---} 

+ I - I And I Or I . . . 

I MakeTuple | Select; 

I MakeArrayp | Fetch 

I MakeOneof Ta3j jv | Select Ta3j jv | Is?Xa 3 

J Cons I Mil I Hd I Tl I Mil? 

SE \ L OP(SE,...,SE) 

I L F(SE,...,SE) 

I if (SE, BE, BE) 

X = BE;---;X = BE 

Dealloc (X); • • • ; Dealloc (X) 

{Bs—Ds inX} | E 

{...F(X,...,X) = BE;...;} 



User Function Names 

Identifiers 

Simple Expressions 

Expression Labels 

Oneof Tags 

Primitive Operators 



Expressions 

Function Applications 

Conditionals 

Block Bindings 

Deallocation Commands 

Letrec Blocks 

Programs 



Table 2.1: KID" syntax 



We use the adjective concrete to refer to values from the standard and instrumented value domains. 
These are values that arise during actual execution of a program. We use the adjective abstract 
to refer to values that arise during abstract interpretation of a program. These values summarize 
all the possible values that could arise during the execution of a program under the standard or 
instrumented interpreters. Hats on values (v) or functions (/) will be used to denote the abstraction 
of some value whenever it is not clear from the context that we are talking about an abstract value. 

The metalanguage in which the interpreter is written has strict semantics. Letrec blocks in the 
metalanguage are written "{ x = e in z }" and have recursive, i.e., letrec, scoping rules. The 
metalanguage can be viewed as a mathematical notation, in which there is no notion of order 
of evaluation. It can also be viewed as an abstract syntax for a functional language. All of the 
definitions written in this thesis could be written in a strict, functional language. 



2.2 Syntax of KID 



KID - is intended to be an intermediate language used when compiling Id programs. For this 
reason, it lacks some features that Id has, such as pattern matching, and so KID - programs can 
be rather verbose. KID - does not have loop expressions — in this work, we interpret and analyze 
them by translating them into tail recursive functions. 

KID - is a sugared form of the lambda calculus. Functions are named, and a program consists of 
a top-level recursive block defining the functions in the program. This allows a concise expression 
of simple programs. The recursive scoping of the program block obviates the need for the Y 
combinator in the language. Expressions are either constants, variables, conditional expressions, 
or applications of functions or primitive functions. The syntax of KID - is shown in Table 2.1. 

A program consists of a recursively scoped block of function definitions. A program must define 
a function named /o that takes no arguments — this corresponds to the main function in a C 
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program. Interpretation of a program begins by invoking this function. Nested functions are not 
allowed in this language. For a treatment of how to transform a set of nested function definitions 
to a flat set of function definitions, see [25] for a description of lambda lifting. Also, no currying, 
or partial application, of functions is supported. Hochheiser [21] describes the compilation of a 
language with currying into a KID-like intermediate form with no currying. Higher-order functions 
and closures will be discussed in Section 9.1. 

Because KID - is a first-order language, identifiers are separated into function identifiers F and 
value identifiers X. Simple expressions are either constants or identifiers. Expressions can be simple 
expressions, primitive operator applications, function applications, conditional or block expressions. 
Primitive and user function applications are labeled with a static label drawn from domain L, the 
set of static expression labels. This expression label will be used in the interpreter to identify 
objects and procedure activations. 

KID - expressions are divided into two major categories: simple expressions (SE) and expressions 
(E). The division simplifies many of the clauses of the interpreter, because simple expressions 
cannot modify or reference the store; they can only reference the environment. All expressions, 
except block and conditional expressions, consist of an operator and a number of simple expression 
parameters. In these expressions, each of the parameters can be evaluated by the simple expression 
evaluator, which does not take or return a store, thus reducing the number of stores that are 
defined. This reduces the clutter in the evaluator definition. Use of SE is even more pronounced 
in the instrumented and abstracted interpreters, where more values are returned by the expression 
evaluator. 

Block expressions consist of a set of recursively scoped bindings, a synchronization barrier, and a 
set of deallocation statements. The interpretation of block expressions is rather involved because 
KID - is non-strict and because the scoping of variables in blocks is recursive, i.e., block expressions 
are letrec blocks. The result of a block expression is the value of the final identifier x in the block's 
inner environment and the block's inner store. The return value may be returned as soon as it is 
available — block expressions are non-strict and the result value is unaffected by the synchronization 
barrier. After all computation in the bindings has terminated, each of the deallocation statements 
is executed — the deallocation statements are hyperstrict in each of the bound variables. 

Anyplace a block expression is expected, a single expression may be used instead. This allows 
expressions such as 

{x =e; 
in x } 

to be written simply as e whenever x is not a free variable of expression e. 

The predicate of a conditional is a simple expression, but both branches must be block expressions. 
Also, the bodies of function definitions must be block expressions. This formulation of the syntax 
ensures that every structure that is allocated is initially bound to an identifier, because the only 
place that a structure allocation primitive can occur is on the right-hand side of a block binding. 

KID - has primitives for constructing and manipulating three types of aggregate objects. The 
primitive MakeTuple takes n arguments and constructs an n- tuple from their values. The primitive 
Select; takes a tuple and returns the ith component. The primitive MakeArrayj? takes a length 
parameter n and some additional arguments, and constructs an array of n elements where each 
element of the array is the value of function F applied to the index and the additional argument 
values. Fetch takes an array and an index and returns the corresponding element of the array. 
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n e N 


= {1,2,3---} 




Integers 


be B 


= True + False 




Booleans 


l e L 


= {h, h, h, ■ ■ •} 




Expression Labels 


a e AL 


= e + AL.L 




Activation Labels 


oi e oi 


= AL:L 




Object Labels 


v ev 


= (N + B + OL) ± 




Denotable Values 


tuple G Tuple 


= (Tuple V, ■ ■ ■ ,V) 

1 \ 




Tuples 


v array £ Array 


= ( Array n i V i ' ' ' i V ) 




Arrays 


Voneof G Oneof 


^ n ' 

= (n,nV,---,V) 




Oneof s 


vast G List 


= ( Cons V, OL) + (nu ) 




Lists 


sv e sv 


= (Tuple + Array + Oneof + List)±_ 


Storable Values 


a G Store 


= OL^ SV 




Stores 




Table 2.2: Standard value domains 





Algebraic types are tagged sums of types. In KID - we represent algebraic types by oneofs, which 
are tagged sums of tuples. MakeOneof j i?H takes m arguments and constructs a oneof tagged with 
j and m components that belongs to a type with n ia g S different disjuncts. The tag j of a oneof 
must be in the range < j < n ia g S . Select j j8 - takes a oneof and returns the ith component of that 
oneof, if the tag of that oneof was j. Isj? takes a oneof and returns True if the tag of that oneof 
was j. 



2.3 KID" Domains 

KID - has the usual types of values: integers, booleans, tuples, arrays, algebraic types (oneofs), 
and lists. These value domains are defined in Table 2.2. Figure 2.2 contains the definitions of the 
least-upper-bound operators on the standard value domains. Figure 2.3 contains the definitions of 
the ordering operators on the standard value domains. The domains are all naturally ordered. 

In order to model sharing of objects properly we will use a store that maps unique labels to tuples. 
A label unbound in a store will map to _L. Tuples will be passed by reference; We will refer to 
tuples by their object labels. The actual tuple will reside in an associated store. A denotable value 
that is a label of an object makes no sense without an associated store. Denotable values, drawn 
from domain V, are either numbers, booleans, object labels or _L (undefined). 

Object Labels 

In order to determine the lifetime of objects, we must be able to distinguish one object from another. 
Therefore, when objects are created they must be assigned a unique label. We will always refer to 
the objects by this label. 

There are many ways a unique label could be allocated for an object. We could use something 
like gensym to create arbitrary new, unique labels. However, in order to implement the non-strict 
interpreter, we must be able to deterministically generate a unique label for each instance of each 
allocation command. We will see later in this chapter that evaluating non-strict, recursive letrec 
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{ def f(x,y) = 

{ t = '°MakeTuple(x,y) 
result = fc °g(t); 

Dealloc(t); 
in result }; 

def g(t) = 
Selecti(t) 
> 



Figure 2.1: Labeled deallocation example 



blocks involves fixpoint iteration over successive approximations of the recursive environment. Each 
time we improve the approximation of the environment and store of a letrec expression, we must 
get the same object labels for each object allocated in the block. Therefore the structure of labels 
must be tied to the structure of the program in some manner so that when we summarize labels 
we summarize information about particular parts of the program. 

In order to name objects uniquely in our interpreter, we will use both a static label from the 
allocation primitives and a dynamic label identifying the particular invocation of that primitive. 
Therefore, the domain OL of object labels will consist of two components: a unique activation label 
and a static label that denotes the expression in the program that allocated the structure. 

Static labels are assigned to each expression in a KID - program. We will only display the pertinent 
labels on allocation primitives and function applications. These labels will be placed to the left and 
above the expression that they annotate. In Figure 2.1, we have labeled the MakeTuple primitive 
with /q by placing Iq to the upper left of the MakeTuple expression. This label forms the static 
portion of the object label of any tuple allocated by executing this particular expression. 

Activation Labels 

We will call the dynamic portion of object labels their activation labels. Activation labels are drawn 
from the domain AL, whose structure will be described below. Our scheme for labeling activations 
is similar to that of Harrison. 

In [19], Harrison uses a pair consisting of a variable name and a procedure string to uniquely 
name variable instances. In his system, every function expression (lambda abstraction) is uniquely 
named statically, e.g., X a ° . The language he is modeling has call-cc and this must be reflected 
in procedure strings, which name a sequential execution path through a program. A procedure 
string consists of a sequence of lambda names with a superscript of d or u to indicate the entrance 
or exit from an instance of that procedure, e.g., the procedure string a^afa^ indicates entering 
the body of lambda ao, entering the body of lambda expression a\ followed by exiting the body 
of lambda expression a\. Harrison shows that these labels uniquely name every instance of every 
object allocated in the program. 

Harrison's scheme works well in a sequential language in which there is only a single thread of 
control that can be named by the procedure string. However, in a parallel language such as KID - , 
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there is no sequential thread of control that can uniquely identify each object. Therefore, we will 
use a variation of procedure strings based on the hierarchical rather than sequential ordering of 
procedure activations. In this scheme, every expression within a function definition will be uniquely 
labeled. An instantiation of a procedure / called from a procedure g executing in activation a can 
be uniquely named by a followed by kj, where kj is the label of the expression within procedure g 
which calls procedure /. Thus, an activation label is the concatenation of the names of the edges 
in the run-time call-tree, where each edge is labeled by the application expression that created that 
edge. An important feature of these labels is that the label assigned to a particular instantiation of 
a procedure invocation will always be the same regardless of the execution order of subexpressions. 
Since object labels consist of an activation label paired with the expression label of the allocation 
primitive expression, this feature carries over to object labels. 

Our activation labels uniquely identify a particular invocation of a procedure during the execution 
of a program. Activation labels AL denote a path down the call tree of a program. Activation 
labels consist of a string of expression labels: 

€.k\. ■ ■ ■ .k m 

where e is the empty activation label and each of the k{ is the expression label of a user function 
application expression. We use strings of expression labels instead of strings of function names be- 
cause we must be able to distinguish two invocations of a single procedure within a given procedure 
activation. Activation label e is used as the activation label of the main body of the program. Each 
time a procedure is called from activation a, we construct a new activation label by concatenating 
the expression label k of the function application to a with a ".", yielding a.k. 

For example, consider the definition of procedure fib, shown below. Procedure fib is recursive; it 
contains two calls to itself within the body. 



def 


fib(i) = 










{ 


P 


= i < : 


2; 










r 


= if p 


then 


1 










else 


{ nl 


= 


fc2 fib(i- 


■l); 








n2 


= 


fcl fib(i- 


■2); 








n3 


= 


nl + n2 





in n3 } 



in r >; 



If we invoke fib(3) in activation a, then we get the following activation tree: 



fib(3) 

,a 



fib(l) 

a.ko 




fib(O) 

a.k\.ko 



fib(2) 

a.k\ 



fib(O) 
a.k\.k\ 
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i)Ul = v 

IUd = v 

dUT = T 

TUo = T 



U _\v 1 ifv 1 = v 2 

] T otherwise 



(Tuple Vi, ■ ■ ■ ,V ni ) Usv (Tuple »!,•• •, W n2 ) = 

f (T«pl e ("iUvWi),"',(i) Bl Uvt«^)) ifni = n 2 
1 T otherwise 

(^r-raj ^1,-Ui, • • ■ ,V ni ) Usv (Array n 2 ,W 1 ,- ■ ■ , W n2 ) = 

f ( J 4n-« s ii,(DiU F ffii),---,(D fil U F «) fi2 )) ifni = n 2 
1 T otherwise 

\tag 1 ,mi ^1 ? ' ' ' ? ^rii / L- '5F \tag2 > m 2 ^1 ' ' ' ' ' ^^2 / — 

f (ta gi , mi Oi U F wi), • • • , (> ni U F w n2 )) if tagi = tag 2 hm 1 = m 2 hn 1 = n 2 
1 T otherwise 

(Cons V 1 ,V 2 ) U S V (Cons W 1 ,W 2 ) = (cons Ol Uy Wi),(> 2 U F W 2 )> 

(m; ) Usf (m; ) = (mi ) 



J cti[o/]U 5 vct 2 [o/] ifo/G OL 

<J\ Ustore °2 = Ao/. < , 

_L otherwise 



Figure 2.2: Least upper bound operators on standard value domains 



Note that the activations labeled a.ko, a.k\, a..k\.ko and a..k\.k\ can all proceed in parallel with 
the parent. The only information we get from the activation labels is that parent activations 
are initiated before their child activations and that parent activations terminate after their child 
activations terminate. 



2.4 KID Interpreter 

In this section we give an operational semantics for KID - in terms of a standard interpreter. First, 
we discuss the evaluation strategy used by the interpreter, then we present the overall structure of 
the interpreter, then we present the program evaluator, simple expression evaluator, and expression 
evaluator. Next we discuss the correctness of the interpreter. Finally, we discuss the deallocation 
problem and give an overview of our solution. 
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_L C v Vf 
f C T Vf 



flEF"2 



Tr^e if f i = ± 
Tr-we if f i = v 2 
False otherwise 



0"1 E Si ore °2 = f\ CTi[/,-] C SF CT 2 [/i] 

{ f\(v t Qv Wi) if ni = n 2 
[ False otherwise 

\Array ^1,^0, ' ' ', ?V-l) EsF (^r-raj n 2 ,W , ■ ■ -,W n2 -i) = 

{ f\ (vi Qv Wi) if ^i = n 2 

\ 0<i<n 1 

\ False otherwise 

\tag i ,?7ij ^1 1 ' ' ' i ^n\ / —SV \tag2 ,Tti2 ^1 1 ' ' ' 1 ^r^2 / = 

{/\( v i Ev w i) lftagi = tag 2 A ra\ = m 2 A n\ = n 2 
False otherwise 

\Cons «1,«1> QSV (fJons W 1 ,W 2 ) = 
(-Ul Q V »l) A (v 2 Q V W 2 ) 

{mi > Qsv {mi > = True 

Figure 2.3: Ordering operators on standard domains 



2.4.1 Evaluation Strategy 

This interpreter is somewhat novel in that it evaluates each expression more than once in order to 
implement the non-strictness of the KID - language. Typically, an interpreter will evaluate each 
expression exactly once. 

Consider the following KID - fragment, which uses non-strictness to define the second component 
of the tuple in terms of the first component of the tuple. 

{ a = '°MakeTuple(x,y); 

x = 2; 

y = Selecti(a) ; 

in a } 

There is no order in which we can evaluate the three bindings of this expression in order to com- 
pletely specify the expression. The evaluation strategy used by this interpreter is to repeatedly 
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evaluate subexpressions in successively improved environments and stores until a limit is reached 
and the expression is fully evaluated. The interpreter will first approximate the environment of the 
body of the block expression by creating and environment in which each of the bound variables is 
bound to _L. Then it will evaluate each of the right-hand-side expressions in that environment to 
yield new approximations to the values of the bound variables and new approximations to the value 
of the store. This process is repeated until both the environment and the store have stabilized. 

For this example, the interpreter would start with environment p° and store a : 

a -► _L 
x -► _L 

C U = -L Store 

After evaluating each of the right-hand-sides in environment po and store gq and combining the 
results into a new environment and store, we get p 1 and a 1 : 

a —> a : Iq 
x-?2 



a 



a : l ^ (Tuple -L, -L) 



in which variables a and x have non-bottom bindings and label a : Iq is bound to a tuple containing 

_L and _L. 

One more iteration would yield p 2 and a 2 : 

a —> a : Iq 
x^ 2 

y^ 

a : l ^ (Tuple 2, -L) 

Now the first component of the tuple labeled a : Iq contains 2. 
Yet one more iteration would yield p 3 and a 3 : 

a —> a : Iq 
x^ 2 

y^2 



a 



| a : Iq -► (Tuple 2,1) J 



in which variable y is now bound to 2. 

Finally, we would reach the environment p 4 and store a 4 of the completely evaluated block expres- 



sion: 



a —> a : Iq 
x^ 2 

y^2 



a 



a : l ^ (Tuple 2,2) 
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in which all three variables have non-bottom bindings and the tuple has no bottom components. 

We can tell that environment p 4 and store a 4 have reached the fixpoint by iterating the process 
one more time. This iteration yields the same result as the previous iteration; therefore, p 4 and a 4 
must be the complete value. 



a 



a —> a : Iq 
x-?2 

a : Iq 



Tuple Ai A) 



The important thing to notice is that each expression was evaluated five times in order to reach 
the fixpoint. The evaluation strategy we have chosen had some effect on how we named objects. 
We had to be able to deterministically assign a label to an object in order to evaluate MakeTuple 
expressions multiple times. 

Most interpreters for non-strict languages use a rewrite system, where subexpressions are rewritten 
when they are evaluated. We chose this evaluation strategy because when we abstract the inter- 
preter we want the compiler to analyze programs by recursively descending the program, evaluating 
as it goes along. We want the program being evaluated to have the same text as the program being 
annotated or verified; so we do not want to use a rewriting interpreter. 

Arrays 

Here is an example of the use of MakeArray. 

{ 
def gl (i, x, y) = 
'°MakeTuple(x,y,i); 

def fl (n, x, y) = 

^MakeArrayg-L (n, x, y) ; 
> 



This example consists of two procedures. Procedure gl, which takes three values, allocates a three- 
tuple containing the three values and returns the value as its result. Procedure f 1 uses MakeArray 
to construct an ra-element array with Vi as the ith element of the matrix: 



r A(i,x,y) 



where (0 < i < n). 



The value v and store a resulting from a call to procedure f 1 with values 3, 17 and 22 in activation 
a would be: 

v = a : l\ 

a : h — ► (Array 3,a./i.O : / ,a./i.l : / ,a./i.2 : / ) 
_ a. h. : l — ► (rupie 22,23,0) 
a./i.l : Iq — ► (rupie 22, 23, 1) 
a.l\.1 : Iq — ► (Tuple 22, 23, 2) 



2.4. KID~ INTERPRETER 45 

Please note that procedure gl is invoked with the index i of the array element as well as the values 
of the two extra parameters passed to MakeArray. Any number of additional parameters may be 
passed to the element creation function through the extra parameters to MakeArray. These extra 
parameters increase the expressiveness of the language without having higher-order functions. 

Algebraic Types 

In some cases, we would like to represent a value whose type is one of a number of different types. 
In this case, we use an algebraic type, which is a disjoint union of the types. We will refer to 
algebraically typed objects as oneofs. In order to maintain type safety, a disjunct tag is maintained 
on these objects, and special operators are provided to construct and manipulate them. 

For example, consider the transaction algebraic type defined below, 
type transaction = deposit I | withdrawal I 

A transaction is either a deposit or a withdrawal. 

Transactions are represented by tagged structures in the standard semantics. A transaction object 
is either a deposit: 

(0,2 n) 

where the subscript 0, 2 indicates the 0th disjunct of a type with two disjuncts, or a withdrawal: 

(1,2 m) . 

where the subscript 1,2 indicates the 1th disjunct of a type with two disjuncts. Any particular 
transaction value will be either a deposit or a withdrawal. 

The KID - code to create and manipulate structures of the transaction type using these primitives 
would look like: 

def make_deposit (n) = 
/o MakeOneof ,2(n) ; 

def make_withdrawal(m) = 
'°Make0neofi i2 (m) ; 

def deposit_amount (d) = 
if Is (d) 

then Selector (d) ; 
else Error () ; 

def withdrawal_amount (w) = 
if Isi(w) 

then Select^iCw) ; 
else Error () ; 

def deposit? (t) = 
Is (t); 
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where Error is a primitive that always returns bottom (and presumably drops the user into the 
debugger). 

Some algebraic types are defined recursively. These types are used to represent things such as lists, 
trees, and graphs. Because lists are used so often in functional programs, KID - has primitives 
specifically defined to create and manipulate list objects. 

2.4.2 Interpreter Structure 

This section introduces the structure of the standard interpreter and defines several properties that 
the interpreter satisfies. The interpreter consists of three semantic functions, SE, E and VE, which 
together interpret KID - programs. The following are the signatures of the semantic functions that 
make up the interpreter. 



SE 

E 

VE 



SE—fEnv—fV Evaluates simple expressions 

E— fEnv—f Store— f AL—f{V X Store) Evaluates expressions 
Prog—^(V X Store) Evaluates programs 



where Env, the domain of environments, is defined as: 

Env = X -> V. 

Environments map identifiers, or variables, to values. An identifier that is unbound in an environ- 
ment maps to _L. 

The function VE evaluates a program and returns a denotable value and a store as the result. The 
function E takes an expression e, an environment p, a store a, and an activation label a and returns 
the denotable value and new store resulting from evaluating the expression e in p, a, and a. We 
call a triple consisting of an environment, a store, and an activation label a context — it contains 
the contextual information necessary to interpret an expression. 

Definition 2.1 (Dynamic Context) A dynamic context is a triple consisting of an environment, 
a store, and an activation label. 

In order to show that the interpreter is sound, i.e., it terminates on well-behaved (non-looping) 
programs, we must show that procedures SE and E are monotonic. Monotonicity is required to 
show the existence of the fixpoints computed during evaluation of letrec blocks. The monotonicity 
of E depends on a property called extensionality . 

Definition 2.2 (Extensionality) A function f is extensive if, 

\/x G Domain (/). x C f(x) 

In other words, the result of f(x) always includes x — function / only adds information to its 
argument. 

We will have to show that E is extensive, that is, E only adds to the bindings of the store that it 
takes as input. The set of locations bound in the store resulting from a call to evaluator E will be 
a superset of the set of locations bound in the input store. 
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Proposition 2.3 (S is extensive) 

\/p G Env, do G Store, a G A£, 
3 ( v,cti) = S\ e }pa a , 

The extensionality of 5 is used in the proof of the monotonicity of S. 
Proposition 2.4 (S is monotonic) 

Vpo 5 Pi £ Env,ao,ai G Store, a G A£, 

] (»o,(r( l )=^[e]/)o(Toa, 

3 ( vi,ct() = £[ e ]pi<Tia , 

(/Oo E /Oi) A (do C dx) => (v C Vl ) A (d C ctJ) 

2.4.3 KID" Program Evaluator 

The program evaluator VS evaluates the main function by invoking the expression evaluator with 
the text of the body of the function, an empty environment, an empty store, and an empty activation 
label. The definition of the program evaluator is given below: 

VS\pr\ = 

X X ' ' ' Ji( x i,li • • • ? x i, n t ) — e i j ' ' ' J — Pf j 

in S I e ] ± Env ± store e } 

where 

/o() = e 

is the definition of the main procedure /o in program pr. 

The purpose of the program evaluator is to provide the initial environment to the expression 
evaluator so that it may evaluate the body of the program. Function identifiers are handled 
specially; they are not bound in the environment. A different model of program evaluation would 
yield an environment of functions, and one could invoke any of the procedures in the program 
with arbitrary arguments. We chose the whole program view because it is simple and because it 
is consistent with the approach of many systems where programs are compiled and run as a single 
unit with a single entry point. 

2.4.4 KID" Simple Expression Evaluator 

The simple expression evaluator takes a simple expression and an environment and returns a deno- 

table value. It is used by the expression evaluator. Numeric and boolean literals are evaluated to 

numeric and boolean constants. Identifiers, or variables, are evaluated by looking them up in the 

environment. 

SS \n~\p = n where n is a number 

SS \b ~\p = b where b is a boolean 

SS [ x ] p = p[x] where s is a variable 

Simple expressions cannot modify the store, so no store is passed into or returned from procedure 
SS. 



48 CHAPTER 2. PROBLEM STATEMENT 

2.4.5 KID" Expression Evaluator 

The expression evaluator will now be denned as a dispatch function on the structure of the input 
term. Remember that the expression evaluator takes an expression, an environment, a store, and 
an activation label, and returns a value and a new store. 

Evaluation of Simple and Primitive Expressions 

The first three clauses of the interpreter define the semantics of constants and variables: 

E\ n ~\pa a = ( SE \ n ] p , a ) where n is a number 
E\b ~\pa a = ( SE \ b ] p , a ) where b is a boolean 
£ [[ a: ] p a a = ( SE [ x ] p , a ) where £ is a variable 

All three of these clauses call the simple expression evaluator, and in these clauses the input store 
is returned unchanged because simple expressions cannot modify the store. 

The next clause shows the evaluation of a simple arithmetic expression. 

E{ +(se 1 ,se n ) jpaa = { v 1 = SE { se 1 ] p ; 

v 2 = SE{ se 2 \p 
in ( v 1 + v 2 ,a) } 

The two operands are evaluated first, then the primitive operator + is applied to those values, 
and the result is returned. These primitive operators do not modify the store; consequently, it is 
returned unchanged. 

Evaluation of Function Applications 

We evaluate an application expression by evaluating its arguments and forming environment p' 
and activation label a'. Environment p' is obtained by extending the empty environment with 
bindings from each of the formal parameters to their actual values. We concatenate activation 
label a with the expression label k of the activation expression to form the new activation label a' . 
Then we evaluate the body of the function in environment p', store a, and activation label a' . The 
non-strictness of functions and data structures is handled by the implementation of letrec blocks, 
shown later in this section. So if we evaluate the body of a procedure before all of its arguments 
have been evaluated, those arguments will be undefined (_l_) or partially defined (if they are bound 
to labels of data structures). 

E{ k f(se 1 ,---,se n ) jpaa = { v 1 = SE [ se x ] p ; 

v n = SE I se n ] p ; 

p' = ±Env[vi/x 1 ,---,V n /x n ]; 

a' = a.k; 

in E [ e ] p' a a' } 

where f(x\, • • • , x n ) = e is a definition in the program 

Note that the body of function / is evaluated in a new environment and a new activation a.k con- 
sisting of the current activation label a concatenated with the expression label k of the application 
expression. 
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Evaluation of Conditionals 

Conditionals are evaluated by first interpreting the predicate, and then interpreting one of the 
branches of the conditional depending on the value of the predicate. 

S{ if (se ,ei,e 2 ) Jpaa = if SS { se J p = True 

thenS \ e\ Jpaa 



elseS I €2 Jpaa 



Evaluation of Block Expressions 



Evaluation of KID - letrec blocks is rather complex because they have recursive scope and because 
KID - is non-strict. They are evaluated by solving the recursive equations resulting from interpret- 
ing each of the binding right-hand-sides in an environment that has the letrec block variables 
bound to the values of the binding right-hand-sides. This recursive equation is solved by fixpoint 
iteration of function EvalBindings, starting with an initial approximation of the environment that 
binds each of the X{ to bottom and an initial approximation of the store equal to the incoming 
store. 

After the bindings have been evaluated completely, the deallocation statements are executed. The 
deallocation statements have no effect in the standard interpreter, but they will be modeled more 
precisely in the instrumented interpreter in Chapter 3. 
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where 

EvalBindings ([[ x\ = e\\ . . . ; x n = e n ] , p, a, a) = 
{(v 1 ,a 1 ) =S\e 1 \paa; 

(v n ,a n ) =S\ e n jpaa; 

p' = jofOi U p[x 1 ])/x 1 , ■■■,(v n U p[x n ])/x n ]; 

(p",a") = \fp' = pAa' = a 

then ( p', a' ) 

else EvalBindings ([[ x\ = e\\ . . . ; x n = e n ] , p', a', a) 
in <//>")} 

Evaluation of Tuple Primitives 

The next two clauses give the evaluation rules for tuple data structures. The primitive MakeTuple 
takes m values and returns a structure containing those m values. This clause constructs a unique 
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object label ol by pairing the current activation label and the expression label of the MakeTuple 
expression, and returns a store that has ol bound to the new tuple in the incoming store. The 
object label is returned as the value of the expression. This clause only adds information to the 
store, thus preserving the extensionality of E. 

E\ MakeTuple (sei, •••, se m ) ~§p a a = { v i =S£\se\\p; 

v n =SE{ se m ] p ; 
ol = a : /; 

^triple — \ Tuple ^1 1 ' ' ' i ^m) ? 

a 1 = a[ol -► (vtupie U a[ol])]; 
in ( ol, a') } 

Tuple selection is accomplished by evaluating the argument to the Select; primitive, yielding an 
object label, and looking up the value of that object label in the current store. The ith component 
of that tuple is returned as a value, along with the current store. 

E \ Select; (se) \pa a = {ol = SE \ se ] p ; 

{Tuple n,- ■ -,v n ) = a[ol]; 
in ( Vi,a) } 

Evaluation of Array Primitives 

The following three clauses give the evaluation rules for array data structure operators: MakeArray, 
Fetch, and Bounds. The primitive MakeArray j 8 takes a simple expression that evaluates to length 
n and r simple expressions that evaluate to values to pass to function /,-, and makes an array of 
length n where the jth component is fi(j, v\, ■ ■ ■ , v r ). Note that this clause only adds information 
to the store, thus preserving the extensionality of E. 

E\ MakeArray j 8 (seo, se\, ■■■, se r ) \p a a = 

{ol = a : k; 

n = SE I se ] p ; 

v 1 =S£\se 1 \p; 

v r = SE I se r ] p ; 

(u ,a ) = E\ e; ] (J-£„„[0/a:o, v 1 /x 1 , ■ ■ ■, v r /x r ]) a(a.k.O) ; 

( M n _i,o- n _i } = E{ e t 1 (± E nv \n ~ 1 /zp, vi/x 1 ,- ■■,v r /x r ])a(a.k.(n - 1)) ; 

^array — \Array ^i ^>0i ' ' ' i ^n — l) i 

a' = a[ol -r (v array U a[ol])] U |J aA ; 

\0<i<n / 

in ( ol, a') } 
where fi(xo, x\, ■ ■ ■ , x r ) = e 8 - is a definition in the program. 

The primitive Fetch takes an array a and an index i, and returns the ith component of a. 

E\ Fetch(sei, 862) ] pact = 
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{ol = S£\se 1 }p; 

i = S£{ se 2 }p; 

(Array n, Vq, ■ ■ • , W n _i) = a[ol]; 

in ( Vi,a) } 

The primitive Bounds takes an array and returns the length of the array. 

£\ fc Bounds(se) \paa = 
{ ol = S£ I se ] p ; 

(Array n,V ,- • -,^-l) = a[ol]; 

in ( n,o) } 

Evaluation of Algebraic Type Primitives 

The following three clauses define the behavior of the interpreter on the primitives that allocate 
oneofs, select components from oneofs, and test the tags of oneofs. The primitive MakeOneoft a3irit 
allocates a oneof whose tag is tag and which belongs to a type with rit ags tags and whose elements 
are the values of simple expressions se\ through se m . The primitive Is iafl ? returns True if the 
tag of the oneof to which simple expression se evaluates is tag. The primitive Select ia3j8 - returns 
the ith component of the oneof to which se evaluates if the tag of that object is tag; otherwise it 
returns _L. 

£\ 'MakeOneof fafli „ ta3S (sei,---,se m ) jpaa = 
{ Vl =S£{ se 1 }p; 





v ni = S£ I se ni ] p ; 




ol = a : /; 




v oneof = \i,ntag S V i > ' ' ' ' V rrij ] 




a' = a[ol -► (v oneof U cr[ol])]; 




in ( ol, a 1 ) } 


n 


Is tag 1{se) jpaa = 




{ ol = S£ I se ] p ; 




(ta g ',n tags v ,---,v m ) = a[ol\; 




b = if tag = tag' 




then True 




else False; 




in ( b,a) } 


n 


Select <afli ,-(se) Jpaa = 




{ ol = S£ I se ] p ; 




(tag',n tags V!,---,V m ) = a[ol\, 




v = if tag = tag' ; 




then Vi 




else _L 




in ( v,a) } 
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Evaluation of List Primitives 

The following five clauses give the semantics of the list manipulation primitives. The primitive 
constructor Cons takes an element x and a list vu st and constructs a new list with x as its head 
and vn s t as its tail. The primitives Hd and Tl take a list and return the head and tail, respectively, 
of the list. The constructor Nil returns a new empty list. The predicate Nil? returns True if the 
value is Nil and False otherwise. 

£ \ Cons (sei , se 2 ) Jpaa = { v\ = S£ \ se\ ] p ; 

v 2 = S£{ se 2 }p; 
ol = a : /; 

Vcons = (fJons V\,V 2 ) ', 

a' = a[ol -► (v cons U cr[ol])]; 
in ( ol, a') } 

£\ Hd(se) Jpaa = { ol = S£ \ se ] p ; 

{cons v 1 ,v 2 ) = cr[ol]; 
in ( v t ,a) } 

£\ Tl(se) Jpaa = { ol = S£ \ se ] p ; 

{cons v 1 ,v 2 ) = cr[ol]; 
in ( v 2 ,a) } 

£{ 'Nil() \ pa a = { ol = a : /; 

a' = a[ol^ (( Ni i ) U a[ol])]; 
in ( ol, a') } 

£ I Nil? (se) ] p a a = { ol = S£ \ se ] p ; 

b = if a[ol]. tag = Nil 
then True 
else False; 
in (b,a) } 

2.4.6 Soundness of Standard Interpreter 

Theorem 2.3 The interpreter £ is extensive with respect to the store. 

\/p G Env,ao £ Store, a £ AL, 

3v £ V, (Ti £ Store 

{ v, a 1 ) = £ { e ] p a a => a C a 1 



Proof: 



By structural induction: 

— The clauses that interpret simple expressions and arithmetic primitives return the store 
unchanged, and so these clauses are extensive with respect to the store. 

— The clauses that interpret conditional and function application expressions call the in- 
terpreter on their subexpressions with their input store. Assuming that interpretation 
of subexpressions (the inductive case) is extensive, then the interpretation of conditional 
and function application expressions is extensive with respect to the store. 
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— To prove the extensionality of the clause that interprets letrec blocks, we have to show 
that EvalBindings is extensive with respect to the store. This function computes the 
solution to the set of recursive equations formed by the block bindings. The solution 
consists of an environment p' and a store a'. The store a' must include the input store 
a, because EvalBindings calls the interpreter on the binding right-hand sides with the 
input store a, and then takes the least upper bound of the resulting stores. Both of 
these steps are extensive. 

— Each of the clauses that allocate structures are extensive because they only add a binding 
to the store. 

— Each of the clauses that fetch values from structures are extensive because they return 
the input store unchanged. 



Theorem 2.4 The interpreter functions S£ and £ are monotonic. Given simple expression se, 
expression e, and activation label a, we show monotonicity with respect to the environment and 
store: 

Vpo,Pi € Env,a ,ai G Store, 

Po E Pi => S£ I se ] p C S£ { se ] p 1 

Po E Pi A cr C cti => £ I e ] p a a C £ { e ] p 1 a x a 

Proof: 

First S£, by structural induction: 

— The clauses that evaluate numeric and boolean literals always return the values of those 
literals; therefore, S£ [ c ] po C S£ [ c ] p\ , because the values of those literals are 
independent of the environment. 

— The clause that evaluates identifiers looks up the identifier in the environment. If p\ is 
more defined than po, then the value of x in p\ must be at least as well defined as the 
value of x in po. Therefore, S£ \ x ] po C S£ \ x ] p\ . 

Now £, by structural induction: 

— The clauses that evaluate constants and literals are all monotonic because the values 
they return are from calls to the simple evaluator, which is monotonic, and the stores 
they return are the incoming stores. 

— The clauses that evaluate arithmetic and relational operators are all monotonic because 
these operators are monotonic (e.g., (_L + 2) C (3 + 2)) and the values passed to these 
operators are obtained from the simple expression evaluator, which is monotonic. 

— In the clause that evaluates function applications, the argument values are obtained from 
the simple expression evaluator, so they increase monotonically as the environment gets 
more defined. We use our induction hypothesis to show that evaluation of the function 
body is monotonic, because recursive calls to £ are assumed to be monotonic. 

— If we assume that the semantic conditional returns _L if the predicate is undefined, then 
as the predicate gets more defined the result of the conditional gets more defined. If the 
predicate is either True or False, then the behavior is monotonic because we assumed 
that the subsequent calls to £ are monotonic. 
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{ def f(x,y) = 

{ t = '°MakeTuple(x,y) 
result = fc °g(t); 

Dealloc(t); 
in result }; 

def g(t) = 
Selecti(t) 
> 



Figure 2.4: Simple deallocation example 



To show that evaluation of letrec blocks is monotonic, we must show that function 
EvalBindings is monotonic. Because E is extensive, each of the new stores created in 
EvalBindings is at least as defined as the incoming store, so EvalBindings is monotonic 
in the return store. EvalBindings is monotonic in the return environment because the 
new environment is created by binding each variable X{ to the least upper bound of the 
new approximation of its value and its binding in the previous environment. Evaluation 
of letrec blocks is monotonic because EvalBindings is monotonic and because we do 
not remove the binding of a label from the store when it is deallocated. 

Evaluation of each of the allocation primitives is monotonic because they are extensive 
with respect to the stores, and because they use the simple expression evaluator to 
evaluate their arguments. 

Evaluation of each of the selection primitives is monotonic because they return the value 
of a structure from the input store. If the store becomes more defined then the value of 
a structure in the store must stay at least as well defined as it was before. 



2.5 The Deallocation Problem 

We are trying to solve two related problems. One problem is: given a program with deallocation 
statements in it, verify the correctness of those deallocation statements. The second problem is to 
insert deallocation statements into a program automatically. 

In either case, we must know when a deallocation command is correct. In the program in Figure 2.4, 
procedure f contains a statement deallocating the object bound to variable t. This statement is 
correct only if the structure to which t is bound was allocated within the body of f , the structure 
does not escape from the result of f , and there is no other statement deallocating that structure. 

Thus, we are interested in four important bits of information for any program or procedure in a 
program: 

• the identities of the objects to which variables are bound; 

• the identities of the objects that procedures allocate; 
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• the identities of the objects that procedures return; and 

• the identities of the objects that procedures deallocate. 

The first bit of information is used to determine the other three and to associate lifetime information 
with the program once it has been analyzed. The second and third bits let us determine which 
objects are reachable, and potentially live, outside of the current procedure activation. These two 
pieces of information are also used to determine which objects have lifetimes completely bounded 
by the lifetime of a procedure's activation frame. Given more precise information about the order 
of execution of a procedure body, its arguments, and its child procedure calls, we could perform 
better dependence analysis that would tell us which objects that are live in this procedure activation 
frame are needed after termination of this activation frame. The last bit of information is necessary 
in order to prevent errors that can occur if the heap manager is requested to deallocate the same 
object more than once. 

2.6 Overview of Our Solution 

The goal of this thesis is to develop an analysis that yields the necessary information to verify or 
insert storage reclamation code. In the next three chapters, we develop a solution to the problem 
of determining object lifetimes at compile-time. Chapter 3 describes an interpreter for KID - 
that allows us to determine the unique identities of objects at run-time and to determine exactly 
when these objects are allocated and when they are no longer reachable. Chapter 4 describes 
an abstraction of this semantics that allows us to compute a generalization of the lifetimes of 
objects over all executions of a program. Chapter 5 gives algorithms for verifying and inserting 
deallocation statements using information from lifetime analysis. Chapters 6, 7, 8, and 9 extend 
the value domains and the standard, instrumented, and abstract interpreters to handle arrays, 
lists, and I-structures. Chapter 10 will describe the compile and run-time performance of programs 
automatically annotated by the compiler and Chapter 11 presents the conclusions we have reached 
during this work. 
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Chapter 3 



Instrumented Semantics 



Before we can formalize the conditions that must be satisfied statically by a correct storage deal- 
location command, we must know the conditions that must be satisfied dynamically for that deal- 
location command to be correct. The standard KID - interpreter treats the deallocation primitive 
as a no-op, and so it is not sufficient for our purposes. 

In order to determine that a deallocation command is correct, we must be able to determine that 
no reference is made to an object after it is deallocated. In a sequential interpreter, we could mark 
the location when it was deallocated, and any further reference to that location would produce an 
error. In our interpreter, however, we cannot mark an object as deallocated because of the way we 
evaluate letrec blocks — stores are repeatedly passed through all subexpressions of the block. 

In this chapter, the standard semantics will be augmented to collect information about which 
objects were allocated, dereferenced or deallocated by each expression. These collections of events 
can be examined after a program has been interpreted to see if any object was dereferenced after 
it was deallocated. 

The activation labels defined in Chapter 2 give a partial order on the time of execution of the 
instances of each subexpression in a program. In this chapter, we will assign new activation labels 
for the body of each letrec block as well as for each procedure application, so that we can measure 
finer differences in execution times. Activation labels as defined earlier are sufficient to distinguish 
each object that is created by a program, but they are not sufficient to distinguish in which control 
region a particular deallocation takes place. 

In the first section of this chapter, we will see how the information we would like to gather affects 
the structure of the instrumented interpreter. In the next section, we will present the instrumented 
interpreter. Following that, we will discuss the correctness of the interpreter with respect to the 
standard interpreter given in Chapter 2. Finally, we will work through the interpretation of a few 
examples. 



3.1 Instrumented Interpreter Characteristics 

The four pieces of information needed to verify the correctness of the deallocation of the structure 
bound to a variable during the execution of a program will help us define the domains of the 
instrumented interpreter and the signatures of its functions. 
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3.1.1 Collecting the Necessary Information 

First, we must be able to identify individual objects in a program in order to determine when two 
variables are bound to the same object. We use the object labels defined in Chapter 2 to name 
objects uniquely. Although the activation label component of the object label has structure that 
allows us to determine the relative timing of object allocation and deallocation, we only consider 
equality of labels, not ordering on the structure of labels, when we manipulate sets of object labels. 
For instance, when we take the union of two sets of labels, we return a set containing all of the 
different labels. We do use the structure of individual labels as our notion of time of execution, 
though. 

Second, we must collect the labels of objects allocated during the evaluation of an expression. In the 
standard interpreter for KID - , each expression evaluates to a single complete value: a denotable 
value and a store. In our instrumented interpreter, each expression evaluates to a denotable value, 
a store, and three sets of events. These event collections name the objects that were allocated, 
deallocated, and referenced, and the activations in which the event occurred. 

Finally, we can examine the values to which expressions evaluate in order to see what locations 
may be reachable from the result of an expression. The result of interpretation of an expression is 
a denotable value and a store. We can traverse the value with respect to the store to determine 
the set of reachable locations. This information will be used to formulate a conservative safety 
condition for deallocation statements. We will be able to test this condition in the context of the 
deallocation statement, rather than during a postmortem after execution occurs (as is required 
when examining the allocation, deallocation, and reference events). 

3.1.2 Temporal Ordering of Execution 

Once we have collected sets of allocation, deallocation, and dereferencing events, the next step is to 
give a partial order on the execution of these events. Activation labels have structure that allows 
us to use the hierarchical termination of activations as a measure of execution time. We use that 
to order events in time. 

In the instrumented interpreter, every distinct activation label names a different control region. 
We extend the invocation and termination precedence relations from control regions to activation 
labels. Thus, we say that an activation labeled ao terminates before an activation labeled a\ 
if the termination of the control region labeled ao must precede the termination of the control 
region labeled a\. In other words, ao is a prefix of ai, that is, the control region, or activation, 
named by ao is an ancestor of the activation named by a\. In the KID - interpreter, termination 
proceeds hierarchically — parent activations cannot terminate until all of their child activations 
have terminated. 

Every letrec block in KID - has a group of bindings, a barrier, and a group of deallocation 
commands. If the block expression has label k and is executing in activation a, then a.k will be 
the label of the control region containing the group of bindings, and a.k~ will be the label of the 
group of deallocation commands. These two activation label satisfy the relation: (a.k a.k~). 

Definition 3.1 (Activation Label Termination Order) The relation ao ^t a i means that 
activation ao must terminate before activation a\ and is defined as follows: 

a o diT a i = ( a o = a\.j3) 
where (3 is a string of zero or more expression labels. 
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Figure 3.1: Instrumented semantic domains 



In other words, activation a\ must terminate before activation ao terminates if ao is an ancestor 
of a\ . If activation ao is preceded by a\ , then we will say that ao is an ancestor of a\ , or that ao 
is higher in the call tree than a\. 

We will use this notion of termination order to catch dangling pointer errors, and to give correctness 
conditions on programs to guarantee that no such errors will occur at run-time. 

3.2 An Instrumented Interpreter 

Now that we have formulated some of the criteria that the instrumented interpreter must satisfy, 
let us develop the interpreter and its value domains in more detail. In this section we define an 
instrumented interpreter for KID - based on the ideas presented earlier in this chapter and in the 
previous chapter. 

3.2.1 Semantic Domains 

As in the standard semantics, the instrumented semantics operates over integers, booleans, tuples, 
arrays, and lists. We will use the domains from Chapter 2, which are shown again for reference in 
Figure 3.1. The domain ordering and least upper bound were shown in Figure 2.2. 

In addition to the values and stores computed by the standard interpreter, the instrumented inter- 
preter three sets of events. An object event pairs the object label of an object that was allocated, 
deallocated, or dereferenced, and the activation label in which the allocation, deallocation, or 
dereferencing occurred. 

The domains of allocation events (AEVs), deallocation events (DEVs), and dereferencing events 
(REVs) are defined as follows: 



AEVs 


= V(OLxAL) 


DEVs 


= V(OLxAL) 


REVs 


= V(OLxAL) 
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Each type of event consists of an object label paired with the activation label denoting when that 
event occurred. We will refer to allocation, deallocation, and dereferencing events collectively as 
object events. The interpreter will collect sets of events, rather than sequences, because not all 
events can be ordered. 



3.2.2 Semantic Functions 

This section presents the definition of the instrumented interpreter, which augments the standard 
interpreter with mechanisms to collect object events. 

The following are the semantic functions that make up the instrumented interpreter: 

Si : E-?Env-?Store-?AL-?(V X Store x AEVs x DEVs x REVs) 
VSi : Prog^{V X Store x AEVs x DEVs x REVs) 

The three extra values returned by the interpreter: A + £ AEVs, A~ £ DEVs, and A R £ REVs, 
tell us exactly which objects were allocated, deallocated, and dereferenced in each instance of each 
expression. 

The function Si takes an expression, an environment, a store and an activation label, and returns the 
resulting value, the resulting store and the sets of allocation, deallocation, and dereferencing events 
yielded by the interpretation of that expression. The function VSi takes a complete program and 
returns the result value and store and the set of allocation, deallocation, and dereferencing events 
from the execution of the program. 

Note that Si, like S, is extensive with respect to stores and also monotonic. These properties are 
necessary in order to prove that the instrumented interpreter terminates with a unique result. 

Program Evaluator Definition 

The definition of the program evaluator is almost exactly like that of the standard program evalua- 
tor, except that it returns three sets of object events. Here is the definition of VSi, which interprets 
programs. 

'PSi I {■■■ f t (x t)1 , ..., x hni ) = e t • • • } ] = 

{ ( v,a,A + ,A~,A R ) = Sil e } ± E nv-Lstore£ ; 
in ( v,a,A + ,A~,A R ) } 

where expression e Q is the body of the main procedure /o- 

Simple Expression Evaluator Definition 

The instrumented interpreter uses the simple expression evaluator from the standard interpreter. 
This is shown for reference in Figure 3.2. 
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SE \n~\p = n where n is a number 
SE \b ~\p = b where b is a boolean 
SE [ x ] p = p[x] where a; is a variable 

Figure 3.2: Simple expression evaluator 

Erlnjpaa = { SE [ n ] p , a, 0, 0, ) 

^[M^« = (^[6^,(7,0,0,0) 
E^xlpaa = (SE I x}p,a,(&, (&,(&) 

E I l+(se 1 ,se 2 )jpaa = ( SE { sei 1 p + SE [ se 2 ] p , ct, 0, 0,0 ) 

Figure 3.3: Evaluation of simple expressions and primitive operators 

Expression Evaluator Definition 

In this section we discuss the definition of the instrumented expression evaluator. 

Simple expressions and primitive arithmetic and boolean operators are evaluated in a manner 
similar to that of the standard interpreter. The result is a quintuple consisting of the value, 
the incoming store, and three empty sets because simple expressions cannot update the store, or 
allocate, deallocate or dereference locations. These four clauses of the interpreter are shown in 
Figure 3.3. 

The clauses for evaluation of function applications and conditionals are shown in Figure 3.4. These 
clauses are the same as the corresponding clauses from the standard interpreter, except that evalu- 
ation of the body of the function and the taken branch of a conditional yield sets of object events. 

Evaluation of letrec blocks in the instrumented interpreter is similar to evaluation of letrec 
blocks in the standard interpreter, except that this interpreter must collect the sets of labels of 
objects allocated and deallocated by each binding right-hand-side. In addition, the body of the 
letrec block in the instrumented semantics will be evaluated in a new activation, whose label is the 
letrec block's expression label concatenated to the current activation label. This new activation 
label gives us a more precise notion of when objects are allocated, deallocated, and dereferenced. 
This information will be used to determine if any dangling pointer errors occur. The labels of 
objects deallocated by the deallocation statements of a letrec block are returned with the set of 
labels of objects deallocated during execution of the bindings. The interpreter clause for letrec 
expressions is shown in Figure 3.5. 

Figure 3.6 gives the evaluation rules for tuple data structures. These are similar to the corresponding 
clauses of the standard interpreter, except that they return object events. Make Tuple returns the 
same value and store in the instrumented interpreter as in the standard interpreter, but it also 
returns three sets of object events. The dereferencing and deallocation event sets are both empty, 
but the allocation event set consists of a single element: the object label paired with the current 
activation label. The primitive Select; returns the ith component of the tuple, the incoming store, 
empty sets of allocation and deallocation events, and a dereferencing event set consisting of a single 
element: the object label of the argument paired with the current activation label. 
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£l\ k f(se 1 ,---,se n ) jpaa = 

{v 1 =S£\se 1 \p; 

v n = SS I se n ] p ; 

a' = a.k; 

(v,a',A+,A-,A R ) = Si{e j {p[v 1 /x 1 , ■ ■ ■ , v n /x n ])aa' ; 
in (v,a',A + ,A-,A R ) } 
where f(x\, • • • , x n ) = e is a definition in the program 
£l\ if (se ,ei,e 2 ) ] pa a = 

if S8 I se jp 
then Si \ e\ ] paa 
else Si [ e 2 ] /wra 

Figure 3.4: Evaluation of conditional expressions 

Figure 3.7 contains the clauses of Si for the array primitives. These clauses are the same as the 
clauses from the standard interpreter except that they return sets of allocation, deallocation, and 
dereferencing events in addition to a value and store. The primitive MakeArrayj 8 collects the 
allocation, deallocation, and dereferencing events from each of the calls to /,- and augments the 
set A + of allocation events to include the allocation of object ol in activation a. The Fetch and 
Bounds primitives record that the label ol of the array passed to them was referenced in the current 
activation a in A fi , the set of reference events that they return. 

The evaluator clauses for algebraic types, given in Figure 3.8, and the evaluator clauses for list 
primitives, given in Figure 3.9, are similar to the corresponding clauses from the standard inter- 
preter, except that the constructors return non-empty allocation event sets, and the selectors and 
predicates return non-empty dereferencing event sets. 

3.2.3 Correctness of the Interpreter 

We will consider the instrumented interpreter to be correct if the denotable value and the store 
returned from the execution of a program under the instrumented interpreter are always equal 
to the denotable value and store returned by the execution of the program under the standard 
interpreter. 

Theorem 3.2 The instrumented interpreter is correct with respect to the standard interpreter. 

V pr G Prog, 

3(v s ,a s ) =T£lpr}, 
3{v u a u A+,A~,A R ) =VS I lprJ, 
(v s = v t ) A (a s = a t ) 



Proof: 



Informally: The instrumented interpreter computes the same values and stores as the stan- 
dard interpreter, because all portions of the interpreter that compute values and stores are 
the same as the standard interpreter. I 
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Si I k { Bs—Ds in x} ] paa = 

{ I x x = e x ;...;x n = e n ] = Bs; 
I Dealloc (f/i); • • • ; Dealloc (y k ) ] = Ds; 
p = p[±/x 1 ,---,±/x n ]; 

a 1 = a.k; 

( p', a', A + , A~ , A R ) = EvalBindingsi (-Bs, po, <r, a'); 
A"" =A-'u{{p'[y t ],a')\l<t<k}; 



where 



m 



^],ct',A+',A-",A^)} 



EvalBindingsi ([[ x\ = e\\ . . . ; x n = e n ] , p, a, a) = 
{ (v^a^A+^A'^A^) =e I {e 1 jpaa; 

{ v n , a n , A + n , A~ n , A R n } = Sil e n j paa ; 
p' = jofOi U p[x 1 ])/x 1 , ■■■,(v n U p[x n ])/x n ]; 
<r' =U°i; 

A+' = UA+,-; 

A-' = UA-,-; 

A R ' = UA R ,; 

<pV">",a-",a*"> = 

if p' = p A a' = a 
then (p',a',A + ',A-',A R ') 

else EvalBindingsi ([ x i = e i] ■ ■ ■ ', x n = e n 1 , p', &', a ) 
m(p",a",A+",A-",A R ")} 

Figure 3.5: Evaluation of block expressions 

3.2.4 Soundness of the Instrumented Interpreter 

Theorem 3.3 The instrumented interpreter Si is extensive with respect to stores. 

Ve G E, \/ainAL, \/p G Env, Voo G Store, 
3 ( v, a x , A+, A", A R ) = Si [ e ] pa a , 
a C cti 

Proof: 

Similar to the proof of the extensionality of the standard interpreter. I 
Theorem 3.4 Interpreter function Si is monotonic with respect to the context: 

Ve G E, Ma G AL, V ' po, p\ G Env, Ma , <Ti G Store, 
Po E P\ A (Jo Q o x => Si \ e ] p a a Q Si [ e ] pi^ia 
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£i\ MakeTuple (sei, • • • , se m ) ] paa = 
{ Vl =S£\se 1 \p; 

v n = S£ I se n ] p ; 
ol = a : /; 

^triple — [Tuple ^1 1 ' ' ' i ^m) ? 

a 1 = a[ol — ► Vtupie]', 
in (ol,<r',{(ol, a)},0,0)} 
5/ [ Select, (se) ] /xra = 

{ o/ = <S£ [ se ] p ; 

(Tuple v l7 - ■ -,v m ) = a[ol]; 
in (i; t -,CT,0,0,{(oZ, a)}) } 

Figure 3.6: Evaluation of tuple primitives 

Proof: 

Similar to the proof of the monotonicity of the standard interpreter. I 

3.3 Interpretation of Some Examples 

In this section we will evaluate a couple of examples under the instrumented interpreter to illustrate 
its behavior. 

3.3.1 Interpretation of a Non-Recursive Example 

We will start with a non-recursive example: 

{ def f(w) = 

k H t = fc °g(w); 
a = Selecti(t) ; 
b = Select 2 (t) ; 
r = (a * b) ; 
in r >; 
def g(x) = 

k H y = (x-21); 

t = '°MakeTuple(x,y) ; 
in t > 
def f = 
fcl f(68); 

>; 

If we execute the program using the instrumented interpreter, we get the following call tree: 
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£i\ MakeArrayj. (seo, se\, ■ ■ ■ , se r ) ] paa = 

{ ol = a : k; 
n = S£ I se ] p ; 
v 1 =S£\se 1 \p; 

v r = S£ I se r ] p ; 
cio = a.k.(O); 
{u ,a ,A + ,A- ,A R ) = 

£i\ e i } -i-EnviQ./ x , Vi/ x r , ■ ■ ■ , v r / x r ]aa ; 

a n -\ = a.k.(n — 1); 

( m„_i, (T n _i, A „_i,A „_i,A „_i ) = 

£/[ e; J ± E nv[ (n ~ 1) /xq, vi/%!,- ■ • , v r lx r \oa n _ x 

a' = a[ol -r ( Array n,u ,---,u n -i)] U f |_J cr^- J ; 

A+ ={(o/,a)}U(U,-A+,-); 

A" =UA-,-; 

A* =UA fl ,-; 

in (o/,d / ,A+,A-,A i? ) } 

where fj(xo,x\, • • • , s r ) = e 8 - is a definition in the program 



£i\ Fetch (sei,se2) ] poa = 
{ol = S£\se 1 }p; 
i = S£ I se 2 ] p ; 

{Array n, V , ■ ■ • , W n _i) = a[ol]; 

in (v t -, CT ,0,0,{(a, o/)}) } 

£l I ^Bounds (se) ] pcra = 
{ o/ = <S£ [ se ] p ; 

(Array n, V , ■ ■ • , W n _i) = a[ol]; 

in (n, a, 0,0, {(a, o/ )} ) } 



Figure 3.7: Instrumented evaluation of array primitives 




f (68) 
e.k\ 



g(w) 

e.ki.k 2 .k 



t = e.ki.k2-ko : /q 
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Si I 'MakeOneof t ag ,n tags (se 1 ,- ■ ■ , se m ) ] pa a = 
{ Vl =S£{ se 1 ]p; 





v ni = SS I se ni ] p ; 




ol = a : /; 




v oneof = \tag,ntags Vl ' ' ' ' ' Vm / ' 




a' = a[ol -► (v oneof U a[ol])]; 




in (o/,CT',{(o/,a)},0,0)} 


Sil 


Is ia5 ?(se) jpaa = 




{ol = SS I se ] p ; 




(ia<,',,H a3s tfO,-",t>m) = ^[o/]; 




b = if tag = tag' 




then True 




e/se False; 




m(b,<7,<H,{(ol,a)},<H)} 


Sil 


Select <afli ,-(se) ] paa = 




{ol = SS I se ] p ; 




(*a«',n ta , g Vl,---,Vm) = ^[o/]; 




v = if tag = tag' ; 




£/ien f j- 




e/se ± 




in (v,<T,<H,{(ol,a)},<H)} 



Figure 3.8: Instrumented evaluation of oneof primitives 



Each node in the call tree is labeled with the expression that invoked the procedure corresponding 
to that node and with the activation label of that node. We also show the binding of variable t in 
procedure f to the tuple allocated within g, labeled e.k\.ki-kQ.k^ : Iq. 

The result under the instrumented semantics is: 

( 3196 . 

1- Stored -h-h-ko-h ■ h -► {Tuple 68,47)], 
{( e.kx-ki.kQ.k-z : / , e.kx^.ko.k^ )}, 

0, 

{( e.kx-ki.kQ.k-z : / , e.k x .k 2 )} ) 

The first component of the result indicates that the answer was the number 3196. The second 
component of the result, the store, indicates the store at the end of the evaluation of the example. 
The third component indicates that a single location, e.k\.ki-kQ.k^ : /q, was allocated, the fourth 
component indicates that no locations were deallocated, and the final component indicates that 
location e.k\.ki-kQ.k^ : Iq was dereferenced during execution in activation e.k\.ki- 

In this program, the lifetime of the tuple that g allocates is from the time that g allocated the tuple 
until procedure f terminates, because that is the last time that there is a pointer to the tuple. We 
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Si I l Cons(se 1 ,se 2 ) jpaa = { v 1 = SS{se 1 jp; 

v 2 =SS{ se 2 }p; 

Vlist = (Cons V!,V 2 ); 

ol = a : /; 

a' = a[ol —? vn s t\; 

in (oI,<t',{(oI, a)},0,0)} 

SilEd(se)jpaa = { ol = SS \ se ] p ; 

{cons v 1 ,v 2 ) = a[ol]; 

in ( Vi ,ct,0,0,{(o/, a)})} 

Si I Tl (se) ] pa a = {ol = SS [ se ] p ; 

(cw «i,«2> = cr[ol]; 

in (v 2 ,ct,0,0,{(o/, a)})} 

5/[ 'lil() jpaa = {ol =a:l; 

viist = {mi ) ; 

a' = a[ol —? vn s t\; 

in {ol,a',{{ol, a)},0,0)} 

£/[ Nil? (se) ] pact = { o/ = <S£ [ se ] p ; 

b = if a[ol]. tag = Nil 
£/ien True 
e/se False; 
in (6,cr,0,0,{(o/,a)})} 

Figure 3.9: Instrumented evaluation of list primitives 



can also say that the lifetime of the tuple labeled e.k\.k 2 .kQ.k^ : Iq is bounded by the lifetime of 
activation e.k\.k 2 . 
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3.3.2 Interpretation of a Recursive Example 

Now let us evaluate a recursive example. 

{ def foo(t) = 

k H a = Selecti(t); 
b = Select 2 (t) ; 
p = (a == 5) ; 
r = if p then a 

else k H t' = /o MakeTuple(5,7); 
v = fc °foo(t'); 
in v } 
in r } 
def f = 
ki i tO = '!MakeTuple(3,4) 
result = fcl foo(tO) 
in result } 
> 

Evaluation of this program under the instrumented interpreter yields the following call tree: 




tO = e.k^ : l\ 

f(to) 

t' = e.A^.&i.&a.&a : /^V^ 4 ^ 1 



f(t') 
e.k^.ki.kz-ko 



The result under the instrumented semantics is: 

< 5, 

-LSi07-e[e-&4-&l-&2-&3 : l "^ {Tuple 4, 5), €.k 4 \ h -* ( T uple 3,4)], 

{( ck4.kx.k2.k3 : l , ck4.kx.k2.k3 ) , ( e.k 4 : / l5 e.k 4 )}, 

0, 

{( e.k 4 : /1, e-k4.kQ.k2 ) , ( e.k4.kx.k 2 .k3 : / , e.k4.kx.k 2 .k3.k .k2 )} ) 

which shows that the result was the number 5 and that two tuples, labeled e.k4 : l\ and e-k4.kx.k2.k3 : 
/o, were allocated. Neither of these tuples is reachable from the result, and no tuples were deallo- 
cated. The two labels were dereferenced; the object labeled e.k4 : l\ was dereferenced in activation 
e-k4.kx.k2 and the object labeled e-k4.kx.k2.k3 : Iq was dereferenced in activation e.k4.kx-k2-k3.k2. 

3.4 Object Deallocation Safety Condition 

A number of definitions are needed before we can give a safety condition for object deallocations. 
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Given a denotable value v and a store a, we must be able to determine the labels of the objects 
reachable from value v in store a. The following definition defines which objects are reachable from 
a given dynamic value and store. 

Definition 3.5 (Object Reachability) Reachable (v, a), the set of labels of objects reachable 
from value v with respect to store a, is defined as follows: 

Reachable (_L, a 

Reachable (n, a 

Reachable (b, a 
Reachable (ol, a 



SVReachable (_L, a 
SVReachable ({Tuple v 1 , ■ ■ • , v n ) , a 

SVReachable ({Array n i v li • • • i v n) i o 

SVReachable ({t ag ,n v 1 , ■ ■ ■ , v m ) , a 

SVReachable ({cons ^1,^2)5^ 
SVReachable ({nu ) , o 



{ol} U SVReachable (a[ol], a) 



M Reachable (f 8 -, a) 

i 

M Reachable (f 8 -, a) 

i 

M Reachable (f 8 -, a) 

i 

Reachable (v\, a) U Reachable (v2, cr) 



We also need to know what objects are reachable from the context surrounding an expression. We 
will call these objects the inherited objects. These are the objects that an expression can use that 
were allocated outside of the expression. 

Definition 3.6 (Inherited Objects) The function Inherited (e, p, a) returns the set of labels of 
objects reachable from FV (e) given environment p and store a: 



Inherited (e, p, a) 



M Reachable (p[w\, a) 

weFV (e) 



Remember that if variable w is unbound in environment p, then p[w] is bottom. 

Previously, we defined a dangling reference, or dangling pointer, to be a pointer that was deref- 
erenced after it was deallocated. A pointer will also be considered dangling if the activation in 
which the object is deallocated may terminate before the activation in which the object is allocated 
(because an allocation is another form of dereferencing a pointer). To be more precise, the activa- 
tion in which an object is allocated or dereferenced must always terminate before the activation in 
which the object is deallocated. 

Definition 3.7 (Dangling Reference) For a program pr, let 



W ,cr,A+,A-,A fl ; 



R 



VEi I pr ] 
Reachable (v, a) 



Then UP [ pr ], the set of dangling pointers after the execution of program pr, is defined by: 

VP{pr\ = 
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ol, a_ 



( ol, a_ ) G A~ 

A(oZ G R 

V(( o/, a r )eA R A -i(a r < T a_)) 
V(( o/, a+ ) G A+ A -.(a+ ^ T «-))) 



The set of dangling pointer events is the set of all pairs of object labels ol and activation labels 
a_ such that: (1) a reference to ol is returned as part of the result of the program, (2) there 
is a reference to ol in some activation a r that does not terminate before activation a_, or (3) 
the activation in which ol was allocated does not terminate before activation a_. Either of these 
conditions counts as a dangling pointer error. 

The deallocation of an object ol in program pr at activation label a_ is considered correct if the 
pair ( ol, a_ ) does not show up in the set of dangling pointer events resulting from the execution 
of program pr. 

Condition 3.8 (Deallocation Correctness) The deallocation of object ol upon termination of 
activation a_ is correct if the following condition holds: 

{ ol, a_ ) G - VP I pr ] 

where pr is the program. 

Condition 3.8 is exact, in that any deallocation command that does not lead to dangling pointer 
errors will be considered correct. However, we have to execute the whole program before we can 
determine if any deallocation command is correct. 

The reason we cannot verify Condition 3.8 as we evaluate each letrec block is that an object 
may be deallocated in some letrec block, and returned as part of the result of that block. This 
deallocation is correct as long as no attempt is ever made to dereference the object once it has 
been deallocated. To be more precise, the letrec block corresponds to one control region, or 
activation, and we can deallocate the structure in this control region as long as the structure is 
never dereferenced in a control region that is an ancestor of this one. 

When we verify that deallocation commands are correct, we are willing to be a little less precise 
and to only accept deallocation commands that deallocate objects in the highest control region 
from which the objects are reachable. This property we call safety — if a deallocation command is 
safe, then it is guaranteed to be correct, although some correct deallocation commands are unsafe. 

There are two reasons to use the deallocation safety condition rather than the deallocation correct- 
ness condition when we test deallocation commands. One is that safety is a local property, and so 
this allows us to verify the safety of a deallocation command in a procedure without considering 
all of the places in which the procedure might be called. This point is especially important if 
the algorithm is to be generalized to an environment including separate compilation. The second 
reason is that the simplest version of our abstract interpreter summarizes all activation labels by 
the empty activation label, and so we cannot tell the relative ordering of subexpressions. 

An object deallocation is safe, or guaranteed to be correct, if the deallocation occurs in the highest 
dynamic context from which the object is reachable. 

Condition 3.9 (Object Deallocation Safety) It is safe to deallocate object ol in context { p~, <7_, a. 
where 

(v,a',A + ,A-,A R ) = erlejp-a-a- 
R = Reachable (v , a') 
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if the following condition holds: 

{ ol, a_ ) G A~ 
A ol G - R 

A V (ol, a r ) G A R . (a r < T a_) 
A V( ol, a + ) G A+. (a+ < T a_) 

an<i if £/iere is onli/ one deallocation of ol, which is in activation a_ . 

This condition is correct whenever an object is deallocated in the highest control region from which 
it is reachable. For instance, it is always safe to deallocate an object that is not part of the result 
of a program upon termination of the main procedure f o, although this may not be of much use. 

Unlike Condition 3.8, Condition 3.9 can be checked at the time the deallocate is performed by 
examining the current program state: the environments of enclosing contexts and the objects 
reachable from those contexts and the current block expression. For this reason, this condition will 
be used in Chapters 4 and 5 to develop a static analysis for verifying and inserting deallocation 
commands. 

Theorem 3.10 (Deallocation Safety Theorem) If an object deallocation satisfies Condition 3.9 
(Deallocation Safety), then it satisfies Condition 3.8 (Deallocation Correctness) . 

Proof: 

Sketch of proof: If an object ol is deallocated in activation a_, the highest activation from 
which ol is reachable, then the allocation of ol and all dereferencing of ol must take place in 
activations labeled a r such that each a r terminates before a_. I 

In the next two chapters, we abstract the instrumented interpreter and restate the safety condition 
in terms of the abstract interpreter. The next two chapters restrict storable values to include only 
tuples so that we can concentrate on the process of abstraction and how to state and test the 
deallocation safety condition. Later in the thesis we add the abstraction of other types of objects 
to our abstract interpreter. 



72 CHAPTER 3. INSTRUMENTED SEMANTICS 



Chapter 4 



Abstracted Semantics 



The purpose of abstract interpretation is to capture information about the execution of an ex- 
pression or program over all possible data. We summarize, or abstract the values produced by a 
program over all executions of a program. For instance, if a program evaluates to a number un- 
der the standard or instrumented interpreters, our abstract interpreter summarizes its result as JV, 
meaning any number. Similarly, our abstract interpreter evaluates both branches of ah conditionals 
in order to summarize the behavior of the conditionals. In this way, the behavior over all control 
paths and over all data can be approximated. 

The abstract semantics that we use captures information about the shape and identities of objects 
that are allocated and the dynamic reachability of these objects from the variables and structures 
to which they are bound. In the Chapter 5, we use this information about reachability and ob- 
ject identities to develop algorithms to verify the safety of deallocation commands and to insert 
deallocation commands in KID - programs. 

In the rest of this chapter, we develop an abstracted interpreter that summarizes the behavior of 
programs in such a way that we can determine object lifetimes. In the hrst section, we briefly 
describe how the abstract interpreter is used. In the second section, we define the abstract value 
domains for this abstract interpreter. In the third section, we describe the evaluation strategy 
used by our abstract interpreter. In the fourth section, we define the abstract interpreter itself. In 
the final section, we show some examples of using this interpreter to determine that lifetimes of 
particular objects are bounded by the lifetimes of given procedure invocations. 

4.1 Using the Abstract Interpreter 

Our abstract interpreter does not directly yield lifetime information. It computes the shape of 
the objects that a program may allocate and how they may be interconnected rather than the 
actual values that may fill those objects. However, a lifetime analyzer uses the connectivity, or 
reachability, information to determine the approximate lifetimes of objects. This intuition lead to 
the development of our abstract interpreter and is also the reason why our abstract interpreter is 
general enough to be used for other analyses. 

To perform lifetime analysis on a procedure, we need to know all the possible values to which each 
variable may be bound and all possible values each object may contain. The questions we ask to 
determine object lifetimes are, "When is the first possible time that this object is reachable from 
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the running program?" and, "When is the last possible time that this object is reachable from the 
running program?" Thus, we must be able to know all possible places from which we can reference 
an object. In the remainder of this section we discuss the precision of this reachability information 
and how to use that information to determine that deallocation commands are safe. 



4.1.1 Precision of Information 

The reachability information generated by the abstract interpreter is approximate. However, the 
imprecision is asymmetrical — a negative result is definite, while a positive result is indefinite. 
The most precise fact we can determine is that an object is not reachable from a given variable. 
If the abstract interpreter determines that an abstract object is not reachable from the result of a 
procedure invocation, then under no circumstances will that object be reachable during execution 
of the procedure invocation under the standard interpretater. We must be very careful to base all 
of our decisions on precise negative information rather than approximate positive information. For 
example, if we determine that variable x may be bound to some set of abstract objects labeled Is, 
then x may be bound to _L, or to one of the locations in Is, but x will definitely not be bound to a 
location outside of Is. 

Given this insight into the kinds of questions we may ask about object reachability, let us reexamine 
the three conditions that an object must satisfy in order to be safely deallocated within a dynamic 
context. First, the object must have been allocated within the context. In other words, the object 
must not have been inherited, or passed in from a surrounding context. We verify this by testing 
that the object cannot be reached from a surrounding context. Since we are talking about the 
binding of a variable, we must actually test that none of the objects to which the variable could 
be bound can be reached from a surrounding context. Next, the object must not escape from this 
context. We verify this by testing that none of the object labels to which this variable could be 
bound are reachable from the result of the context of interest. Finally, this object must not be 
deallocated more than once. We test this by verifying that none of the object labels to which this 
identifier may be bound are in the set of object labels that may be deallocated by other deallocation 
commands. 

4.1.2 The Abstract Deallocation Safety Condition 

The canonical form of a block expression is shown below: 
{ x = e ; 

%-n — 1 — &n — 1 > 

Dealloc(yo) ; 

Dealloc(y m _i) ; 
in x } 

We use Xi for the names of the bound variables, yi for the variables in deallocation commands, 
and Wi as the free variables of a letrec expression. Here, each variable X{ is bound in the block 
expression. The object to which each of the y^s is bound will be deallocated once the bindings 
have completely evaluated, and the value of x is returned as the result of the block expression. 
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The context in which an expression is evaluated provides the environment, store, and activation 
label in which the expression is executed. Let us consider a block expression e evaluated in the 
standard context c = ( p, a, a ). If there are deallocation commands for yi,- • • ,y n in the top level 
of the block expression, then we can verify that these deallocation commands are correct under 
the standard interpreter as follows. First, we determine what locations are passed into e from the 
context. Call this set /. 

I = M Reachable (p[w], a) (4-1) 

weFV (e) 

We can use either a or a' here because the language is functional. If side effects were added we 
would have to use a', although we would still use environment p. 

We evaluate the bindings of the block expression, yielding the environment and store for the eval- 
uation of the body of the expression — call these p' and a' . The resulting value of this evaluation 
is ( p'[x], a' ), where variable x is the result of the block expression. 

Now, we must also determine the set R, which is the set of objects reachable from the result of the 
evaluation of e in context c. 

R = Reachable (p'[x], a') (4.2) 

where x is the result of the block expression above. 

Given this exact information from the standard evaluator — p, a, I, R, p' , and a' — we can 
determine that it is safe to execute each of the deallocation commands in e in a particular dynamic 
context. 

5a/e? ([Dealloc (J/,-) 1) = hp'M $ R (4.3) 

In other words, each deallocation is guaranteed to be correct if the value of yi — the object being 
deallocated — is not inherited from the context (it must have been allocated within e), is not 
returned as part of the result of e, and is not deallocated by any other deallocation command. 

In order to verify deallocation safety in the abstract interpreter, we must perform a similar test. 
So, starting with an abstract context ( p, a, a ), we must evaluate the bindings of e to obtain the 
environment and store (p' and a') of the body of the block expression, and then compute sets / 
and R: 

I = |J Reachable (p[w])a (4.4) 

weFV (e) 

R = Reachable (p'[x])d' (4.5) 

Given all of these abstract values, we can conservatively determine safety using the following pro- 
cedure: 

/ p'M n / = \ 

5 , a/e?([Dealloc(j/ 8 )l) = /\p'[y t ]n R = Q) (4.6) 

V A/\y j * yi p'[yi]np'[y j ] = Q J 

Note that instead of testing for object ol not being in sets / and R, we now must test that none of 
the labels in the value of yi are in sets / or R. Also, we must test for pairwise disjointness of the 
values to be deallocated rather than testing for pairwise inequality of object labels. 

In Chapter 5, we develop algorithms for verifying and inserting object deallocation commands. In 
that chapter we see how all of the necessary values are computed using the abstract interpreter. 
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4.2 Abstracting the Semantic Domains 

The abstract interpreter is supposed to allow us to compute or approximate the value of a useful 
property of a program. We are interested in knowing which objects can be reached from each of 
the variables in the program. 

4.2.1 Abstract Domains 

Figure 4.1 contains the definition of the domains used by the abstract interpreter. We describe 
these domains in more detail in the remainder of this section. 

Activation Labels 

We summarize all activation labels in the standard domain of activation labels by the empty 
activation label e. This abstraction of activation labels is the most extreme way of ensuring a finite 
domain. In Chapter 9 we investigate more precise abstractions of this domain. 

The domain L is the set of static labels attached to expressions in a program. This domain is finite; 
its size is determined by the number of MakeTuple expressions appearing in the program. 

Object Labels 

Abstract object labels are composed of an abstract activation label and a static expression label. 
Since both the AL and L domains are finite, the domain of object labels OL must also be finite. 

Under abstract interpretation, a variable may have a set of objects to which it may be bound 
because execution of an expression in different contexts may bind variables to different object. 
Thus, object references must be sets of object labels. 

Denotable Values 

The domains N and B of integers and booleans have been compressed to a single element each 
because we are uninterested in the actual values computed — only in the shape and connectedness 
of the values computed. 

Values are either scalars, e.g., integers or booleans, or references to aggregates, e.g., tuples. An 
aggregate value consists of a reference to the tuple and a store containing the value of the aggregate. 
A reference consists of a set of object labels Is. The domain V of denotable values therefore consists 
of the sum of abstract integers, booleans and sets of object labels, all lifted over a bottom element 
_L. Note that no objects are reachable from _L. 

Stores 

Stores map individual labels to tuples. Location ol being unbound in a store a is the same as 
having ol bound to _L in a. In the abstract semantics, we use sets of labels Is as references to an 
object. We dereference such a set of labels as follows: 

{Tuple Cl,- • -,C n ) = \_\ a[ol] 
ol^ls 
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Figure 4.1: Abstract value domains 



Abs(±) = _L 

Abs AL (a) = e 

AbsoL (a : I) = e : I 

Abs LS (ls) = (J {AbsoL(ol)} 

ol^ls 

' L if v = ± 

N if v is a number 

Absy (v) = < B_ if v is a boolean 

{Absoiiv)} if f is a location 

T otherwise 

Abs Tup ie ((Tuple v l7 - ■ -,v ni )) = { Tup i e Abs v (v^, ■ ■ ■ , Abs v (v ni )) 

Absstore(v) = \_\ ^Store [Abs L (ol) -? Abs T uple (^[o/])] 

oieOL 
Additional abstraction operators we require are defined below: 

Abs AEV (A+) = 

Abs DEV (A-) = {Abs L(ol)\V(ol,a)eA-} 

Abs REV (A R ) = 



Figure 4.2: Definition of the abstraction functions 



We are determining the tuple to which store a maps the object labels in Is by taking the least 
upper bound of the tuples to which a maps each label in Is. 

Please remember that denotable values that are object references, or labels, are meaningless without 
an associated store. Although the labels themselves are very important in this semantics, the 
true meaning of a denotable value is tied to the object in the store named by the value's set of 
labels. Similarly, the set of labels of objects allocated and deallocated only has any meaning when 
accompanied by a store in which the allocated and deallocated objects reside. 



Abstraction Functions 
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dUI = v 
IUd = v 



»iUj/t) 2 



/si U Ls ls 2 = /si U /s 2 



iV if v i , f 2 are both numbers 

I? if fi, t>2 are both booleans 

v\ U t>2 if v\, V2 are both Ls 

T otherwise 



\ Tuple ^1 ? ' ' ' ? ^rii / L- ' Tuple \ Tuple ^1 ? ' ' ' i ^U2 / — 

f (Tuple (vi U F wi), ■■■,(v n U v w n )) \in 1 = n 2 
I T otherwise 



J cti[o/] U TM j,; e a 2 [ol] iioleOL 
_L otherwise 



Figure 4.3: Least upper bound operators on value domains 



Figure 4.2 contains the definitions of the abstraction functions that map values in the standard 
domains to values in the abstract domains. We need these functions in order to show the correctness 
of the abstract interpreter. 

4.2.2 Least Upper Bound Operators 

Figure 4.3 contains the definitions of the least-upper-bound operators on the abstract domains. 
The domains are all naturally ordered. 
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4.2.3 Reachability 

The abstract interpretation of a program yields a model of what objects are created and what 
objects are reachable from the bindings of a letrec block. Because the abstract interpreter sum- 
marizes information about all executions of an expression, we must represent references to objects 
as sets of abstract object labels. To restate the invariant on abstract reachability, we say that if 
variable x is bound to a set of locations Is then, in any given execution, x can be bound to no other 
locations. This reachability invariant is a constraint on the structure of the abstract object label 
domain. The abstraction function Absoi that maps object labels to abstract object labels must 
enable us to preserve this constraint. 

We need a precise notion of reachable objects in the abstract domains. Given a denotable value 
and a store, we must be able to determine which objects are reachable from that value and store. 

Definition 4.1 (Abstract Object Reachability) Reachable (v, a), the set of labels reachable 
from value v in store a, is defined as follows: 

Reachable (_L, a) = 
Reachable ( JV, a) = 
Reachable (B_, a) = 

Reachable (Is, a) = Is U M SVReachable (<j[Is], a) \ 

\oieis J 



SVReachable (_L, a) = 
SVReachable ((Tuple v ii • • • ? v n) ? (T ) = M Reachable (v{, a) 



4.2.4 Ordering Operators on Domains 

Figure 4.4 contains the definitions of the ordering operators for each of the abstract domains. 
All of the domains are naturally ordered. These operators are necessary to show correctness and 
termination of the abstract interpreter. 

The domain orderings on labels are by name. We consider the set {/o} to be less than {/o,/i} 
regardless of what those locations may be bound to in a given store. 

We say store a\ is less than store <72 if, for all labels ol in the universe of object labels OL, the tuple 
to which ol is bound in a\ is less than the tuple to which ol is bound in 02- Tuples are compared 
element-by-element using the value ordering described above. Again, sets of labels are ordered by 
name, not by the values to which they may refer. 



4.3 Abstracting the Interpreter 

In the abstract interpreter, we cannot evaluate procedure calls by unfolding the body of the called 
procedure because this would never terminate if any of the procedures were recursive. Instead, the 
abstract interpreter constructs an input-output mapping for each procedure in a program. This 
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Figure 4.4: Ordering operators on domains 



mapping describes the behavior of a procedure over each possible set of inputs. We stress that the 
function mapping only approximately describes the behavior of the function. 

When we abstract the interpreter, we make a major change to the clause that evaluates procedure 
applications so that it looks up the result of a procedure application in the input-output mapping 
corresponding to the procedure being applied. The job of the program evaluator, the procedure that 
interprets programs, is to compute the input-output mappings for each procedure. The program 
evaluator iterates a function that improves the approximation of the input-output mappings of each 
procedure until this iteration reaches fixpoint. We describe this process in the remainder of this 
section. 



4.3.1 Computation of Input-Output Mappings 

A KID - program can be viewed as a set of recursive function definitions. These definitions may 
be viewed as a set of equations defining the values of the functions, where the value of a function 
fi is a mapping from values in the domain of /,- to values in the range of /;. A typical system of 
function definition equations is shown below. 



J1O1, 



■/,-(■ 



J m \ % 1 1 



■/,-(■ 



If the system of equations is monotonic with respect to the values of the functions, and the heights 
of all chains in the domains of the functions are bounded, then we can solve this system of equations 
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by using fixpoint iteration. We start with an initial approximation to the solution and generate 
successively improved approximations until we reach an approximation equal to its improved ap- 
proximation — this is the exact solution to the system of equations. 

We start the fixpoint iteration by using an initial approximation of each function that returns 
bottom for all input values. 

/l ( x li ' ' ' i x n) = -L 



Jm\ X i i ' ' ' i X n) — -L 

We can use bottom as the initial approximation to function fi because it is a safe approximation 
to the behavior of unfolding each function application zero times. In general, the value of ff is a 
safe approximation of the behavior of each function unfolded k times, even though it might not be 
a safe approximation of unfolding each function k + 1 times. The value of /?°, however, is a safe 
approximation to the behavior of function fi over any depth of unfolding. 

At the k + 1th step in the fixpoint iteration, we substitute the kth. approximation to function /,-, 
ff , for each use of fi in each equation. The substitution yields the k + 1th approximation to the 
functions, as shown below: 



Jm \ x li ' ' ' 5 X n) — '"' Ji (.''')'' ' 

Fixpoint iteration terminates when f i + = ff for all functions /,- and all possible input values. It 
is guaranteed to terminate when the domains and ranges of the functions are all finite and all the 
functions are monotonic. 

We can view this process as finding the solution to the following equation: 

</i, •••,/*> = Y(F) 

where F is the function that takes an approximation of each of the functions fi and returns a 
refined approximation to each of the functions, and Y is the least fixpoint operator. 

4.3.2 Finiteness of the KID" Abstract Domains 

Although the domain of tuples is not finite, the domains of any particular function must be finite 
because the functions are strongly typed (monomorphically typed). The set of labels L is finite 
because programs are finite, and the depth of nesting of tuples passed as an argument to a function 
depends on the type of that function. The same is true of the result of each function — the depth of 
nesting of the tuples returned and the size of the sets of labels returned are both finite. Therefore, 
the fixpoint iteration described above must terminate. The solution to the recursive set of equations 
exists because the fixpoint iteration described above must terminate. 
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Representing Values of Functions 

In the KID - abstract interpreter, function values are represented by mappings from products of 
denotable values and a store to pairs consisting of a denotable value and a store. The signature of 
these mappings is 

Fen = (V* X Store)^(V X Store) 

These mappings can be thought of as a table, or set of tuples, consisting of input values and the 
corresponding output values. 

Let us consider an example function, f , whose type and function mapping type are given below. 
If / has type {Tuple ^ 5 {Tuple N,N)) — >-iV, then the Fen mapping associated with procedure / will 
have type (V X Store)— ^(V X Store). 

f ■ (Tuple N,( Tuph N,N))^N 
/Mapping ■ (V X Store)-r{V X Store) 

Assume that there are only two locations, Zo, of type {Tuple ^ 5 (Tuple ^ 5 -W)) an d Zi, of type (Tuple ^ 5 N): 

T - n(Tu P le N,( Tup i e N,N)) ( Tup i e N,N) 

L — I'O 5*1 J 

Given this knowledge, we can enumerate all values in the domain of /: 



{h} 



X -Lsto 



Z -+ J_ h -+ I. 

Zo — ► (Tuple -L,-L) h —> (Tuple -L,-L) 

Zo — ► (Tuple -L, {Zl}) X /i — ► (Tuple K.^ J-) 

Zo — ► (Tuple K.^ -L) ^1 ~ ► (r^p/e -L,jV) 

Z ^ (l^/eiLOl}) /i -+ (l^eiLJV) 



The 'x' signs should be read as the cross product of the possibilities for the three portions of the 
input domain: the value, the binding of label Iq in the store, and the binding of label l\ in the store. 
For example, the least defined element in the domain of /'s mapping is: 



*, ^Sto 



and the most defined element is: 



( {M, -Lstoreih — ► (Tuple M., {h}) , h ~ ► (Tuple JV,iV)] ) 



Here is the range of possible results returned by function f : 



_L 

N 



X -Ls< 



Zo ^ -L h -+ ± 

Zo — ► (r^jj/e -L,-L) Zi — ► (Tuple -L,-L) 

Zo — ► (r^jj/e -L, {Zl}) X /i — ► (Tuple K, -L) 

Zo — ► (Tuple K, -L) Zi — ► (r^jj/e -L,jV) 

Zo ^ (l^/eiLOl}) Z X -+ (^ p /eiV,iV) 



Note that because / is an extensive function with respect to the store, the result store must contain 
the input store. For example, if / was applied to the store: 



J-SiorefZo — ► (Tuple JV, _l_) , l\ — ► (Tuple K.i M)] 



4.3. ABSTRACTING THE INTERPRETER 83 



{ def foo(t) = 

{ a = Selecti(t) ; 
b = Select 2 (t) ; 
p = (a == 5) ; 
r = if p then b 

else { t' = /o MakeTuple(5,7); 
v = fc °foo(t'); 
in v } 
in r } 
def f = 

{ tO = ^MakeTupleO^) 

result = fcl foo(tO) 
in result } 
> 



Figure 4.5: A recursive example 



then the only possible result stores are: 

J-SWe^O — ► (Tuple JV, _l_) , l\ — ► (Tuple JV,iV)] 

and 

J-SWe^O — ► {Tuple N_,{h}) ,h ~ ► (Tuple K, M.)] 

because all other stores in the range of / are less than or incomparable to the input store. 

Computation of Function Mapping for an Example 

Now let us go through the steps of constructing a mapping for the recursive example described in 
the previous section. Figure 4.5 contains a program consisting of a recursive procedure f oo and a 
call to f oo from the main expression of the program. First, let us examine the domain of f oo and 
the type of the mapping we will construct for foo. Then we will work informally through the steps 
of building the mapping. Because this program diverges if we try to unfold the procedure calls each 
time the interpreter encounters a procedure application, we have to compute the fixpoint of the 
function that takes the initial input-output mapping of the function (the empty function mapping) 
and produces the final function mapping. 

Figure 4.6 contains the domains of foo. Let us walk through the computation of the function 
mapping for foo, step by step. We start by computing the value of foo on its least defined input, 
and iterate until we reach fixpoint. 

We only compute the value of foo on an input value if that value arises during the abstract 
interpretation of the program. This set of values is what we consider the interesting portion of the 
domain of foo. During each iteration we compute a new approximation of the mapping of foo, 
and we keep track of all values to which foo has been applied. 

In order to compute the function mapping for foo, we start with an initial approximation that maps 
all inputs to bottom, then we use our current approximation to compute successively improved 
approximations until our approximation does not change. 
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Figure 4.6: Domain of a function f oo 



To compute a better approximation, we evaluate the body of procedure foo with each set of inputs 
that appears in the current approximation of the mapping to compute new output values. We 
record the new output values in the improved approximation of f oo's mapping. As we evaluate the 
body of foo, if we encounter applications of foo to input values that do not occur in the interesting 
domain, we add these values to the set of values in the interesting domain. 

The initial approximation for foo returns _L for all input values. We can then initiate the abstract 
interpretation of foo by evaluating the body of the main procedure /q. We encounter a call with 
arguments: 

({h},[h^ (Tuple N,N)}) 

The result from this application is approximated by _L, and we add this value to the interesting 
domain of foo. 

To compute our next approximation to the mapping for foo, we evaluate its body on the single 
value in the interesting domain. We get the following mapping: 



{h}, [h ^ (Tuple K,N)])^ 

(N, [l ^ (Tuple JV,jV),/i 



[Tuple K,N)} 



and we add the input value 

( {M, ^-Store[h 



[Tuple K,M_) ,h 



^ Tuple 



K,N)] 



to f oo's interesting domain, because a call to foo with these input values was encountered during 
the computation of the previous approximation. 

And after one more iteration we reach the following approximation for foo: 

[Tuple N,N}}} - 



{/o}, [l ^ (Tuple K,K),h^ 

(N,[l ^( Tuple N,N)}} 

{h},[h ~+ (Tuple N,N}]}^ 
(N, [/ ^ (Tuple N,N),h 



\ Tup le 



K,N)] 



One more iteration yields the same value. Since we are not adding more entries to the interesting 
portion of the domain, and the values of each of the mapping entries have not changed, we have 
reached the fixpoint for foo projected onto the domain consisting of those inputs with the bindings 
shown in the mapping above. 
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Function Environments 

The abstract interpreter constructs a function environment for a program. Function environments, 
members of domain FEnv, map function names to function values. Function names are drawn 
from domain F, and function values are input-output mappings. 

$ G FEnv = F -> ((V n X Store) -> (V X Store)) 

Because KID - allows recursive function definitions, the interpreter must solve the set of recursive 
equations denoted by the program text that defines the function environment of the program. 

The way we keep track of the interesting domains of each function in the program is with a domain 
map, or DMAP. Each V in DMAP is a mapping from function names to the interesting portions 
of the domains of those functions. We also collect the change, or delta (A^), in the interesting 
portion of the domain of a function. 

V,A V G DMAP = F->V(V* x Store) 

The expression evaluator returns a domain map delta as one of its results. 

4.3.3 Abstract Interpreter Definition 

This section describes an algorithm for abstract interpretation of programs. The algorithm makes 
use of the fact that we are interested in only a few of the elements from the domains of the abstract 
functions defined in a program. This interpreter computes the function environment of a program 
sparsely. That is, the interpreter only computes the elements of the mapping corresponding to the 
input values in which we are interested and to any other inputs that are needed to compute the 
function environment for those interesting inputs. 

The function SEa takes a simple expression, and an environment, and returns the value of the 
expression in that environment. The function Ea takes an expression, an environment, a store, 
and a function environment, and returns the resulting value and store. The function VEa takes a 
complete program and returns the value and store resulting from the execution of the program. 

Note that the set of labels of objects allocated and deallocated during the execution of an expression 
or program is necessarily inexact. Under the abstract interpreter, these sets contain the abstraction 
of the object labels that may be allocated or deallocated under the standard interpreter. The most 
definite thing we can say is which labels were not allocated or deallocated — we cannot say that 
a given location was definitely allocated or deallocated. In the abstract interpreter, we do not 
compute the set of labels of objects that may be allocated or referenced within an expression — 
this is not needed to verify or insert deallocation commands. We do need to know which locations 
may be deallocated by an expression, however. 

The following are the signatures of the semantic functions: 

SE A ■ SE^Env^V 

E A : E^Env^Store^FEnv^{V X Store x DEVs x DMAP) 

VE A ■ Prog^{V X Store x DEVs x FEnv x DMAP) 

where 

p G Env = X —fV Environments 

A" G DEVs = V(OL X AL) Deallocation Events 
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Environments, members of domain Env, map variables, X , to denotable values. The empty envi- 
ronment, -LEnvi maps all variables to _L. Bindings are added to an environment when we evaluate 
the body of a function or a letrec block. Domain maps and domain map deltas map function 
names to sets of values in the interesting domains of those functions. Abstract deallocation events 
track the labels of the objects that were deallocated during the interpretation of an expression. 

Program Evaluator Definition 

Evaluation of a program, which is performed by VSa, defined below, consists of computing the 
function environment $ for the program, and then evaluating the main expression of the program 
in that function environment. The following is the definition of the program interpreter. 

V8 A Iprj = { ( $ , A) ) = InitialFEnv (pr); 

( $, V ) = ComputeFenv (pr, $o, Vq); 

in (v,a,A-,<S>,A v )} 

The abstract interpreter first constructs a function environment ($) that, for each function / in the 
program, maps particular input values of / to the result of applying / to those inputs. Whenever 
we encounter an application of a procedure / we fetch its input-output mapping from the incoming 
function environment. Then we determine the output value corresponding to the set of input values 
provided (including the store and activation label) and use that value as the result of the activation. 
We also make sure that the entry for function / and this set of inputs is non-bottom by adding 
these input values to the domain map for /. 

Once the abstract program interpreter has computed the function environment for the program, 
it evaluates the body of the main procedure fa of the program, and returns the result of this 
evaluation, along with the function environment, as the result of abstract interpretation of the 
program. 

Computing the Function Environment of a Program 

The function InitialFEnv takes a program and returns a function environment and a domain map. 
The initial function environment takes each function and returns bottom. The initial domain map 
takes a function name and returns the set containing bottom. 

InitialFEnv (pr) = 

{ $o = -i-FEnv', 

V /,■ G F : 

V [fi] = {1};V/,GF 
in ( $ , V ) } 

The function ComputeFEnv, shown in Figure 4.7, iteratively improves the approximations of the 
function environment and the domain map until no further information is added. It does this 
by computing a new entry in the function map of each function fa to fa for each value in the 
interesting domain of the function. It also gathers new approximations to the interesting domain 
of the function. The process of computing new approximations is monotonic; so this is guaranteed 
to reach a stable value. It returns the most precise approximation as its result. 
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in ( $", X»" ) } 



Figure 4.7: Procedure to compute function environment 



Simple Expression Evaluator Definition 

The simple expression evaluator is defined in Figure 4.8. Because the integer and boolean domains 
have been summarized by single values, N_ representing any number and B_ representing any boolean, 
evaluation of constants returns less information than in the instrumented and standard interpreters. 
However, evaluation of variables is the same — the value of the variable is found in the current 
environment . 



Expression Evaluator Definition 

This section develops the definition of the abstracted expression evaluator. We can think of the 
expression evaluator as providing the rules for simplifying the right-hand-sides of the equations 
that define the function environment of a program. 
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SE A \ n 1 P = K. where n is a number 
SE A \b ~\p = M. where b is a boolean 
SEa J i ]p = p[x] where a; is a variable 

Figure 4.8: Abstracted simple expression evaluator 

£ A ln\pa$ = { SE A \ n }p,<J, 9,-Ldmap) 
e A lb}pa$ = (SE A lb}p,a,(&,± D MAp) 
E A [xjpa^ = (SE A lx}p,a,$,± DM Ap} 

e A l+(se 1 ,se 2 )lp<T$ = (K,<t,Q,±dmap) 
Figure 4.9: Evaluation of simple expressions and primitive operators 



The first four clauses of the interpreter define the semantics of numbers, booleans, variables and 
arithmetic primitives. These three clauses all invoke the simple expression evaluator. These clauses 
are shown in Figure 4.9. The first three clauses describe how the evaluator handles simple expres- 
sions — it calls the abstract simple expression evaluator to produce the result values. The fourth 
clause shows how primitive arithmetic operations are interpreted — either N_ or B_ is returned, de- 
pending on the type of the operator. The evaluation of primitive arithmetic and logical operations 
can proceed without examining the arguments to the operator because the values of integers and 
booleans are ignored. These four clauses do not modify the store, so a and are returned, and do 
not add any elements to the delta domain map, so Lomap is returned. 

The first major difference between the abstract expression interpreter and the instrumented inter- 
preter is in the handling of function applications. When we interpret a procedure application in 
the abstract interpreter, we look up the result values in the incoming function environment rather 
than directly evaluating the body of the function, as we did in the standard interpreters. First, we 
compute the input value to function /. We use the values and current store a as input into the 
function map for /. The clause of the interpreter for function applications is shown in Figure 4.10. 
Note that we return a delta-domain-map A with the singleton set containing the current input 
value for procedure /. This ensures that we compute the value of function / applied to this input 
value in future iterations of ComputeFEnv. 

The evaluation of conditionals, shown in Figure 4.11, computes a summarization of the value that 
the conditional could yield under any execution of the program. In the abstracted interpreter, the 
predicate is ignored and both branches of the conditional are executed. The least upper bound of the 
values returned by the conditional branches is returned as the result of the conditional expression. 

Abstract evaluation of block expressions is nearly the same as instrumented evaluation of block 
expressions. Figure 4.12 shows the clause of the abstract interpreter for block expressions. 

The abstract evaluation rules for tuple primitives are shown in Figure 4.13. The evaluation of the 
MakeTuple primitive is similar to the instrumented interpreter clause, except that a singleton set 
containing the abstract object label is returned as the result. Note that the new abstract object 
label is just e : /, the label on the MakeTuple primitive. 
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S A \ k f(se 1 ,---,se n ) }pa<S> = 

{ V! =S£ A \ se 1 ]p; 

v n = SS A I se n ] p ; 

(v', a', A-') = $[f][v 1 ,---,v n ,a]; 

A v = -LdmapU ^ {(vi,---,v n ,a)}]; 

in (v',a,A-',A v ) } 

Figure 4.10: Abstract evaluation of function applications 



S A { if (se ,e 1 ,e 2 ) }p<J$ = { ( v x , o x , A~i, A v x ) = S A I e\ ] p°§ ; 

( v 2 , (7 2 , A" 2 , A p 2 ) = £4 j e 2 ] p<T$ ; 
in ( X U v 2 , o x U ct 2 , A~i U A" 2 , A v 1 U A p 2 ) } 

Figure 4.11: Evaluation of conditional expressions 

SaI { Bs — Ds in x} ] pa<f> = 

{ I x x = e x ;...;x n = e n ] = Bs; 

I Dealloc (y x ); • • • ; Dealloc (y k ) ] = Ds; 
p = p[l./x 1 , ■■■,!./ x n ]; 

( p', a', A - , A ) = EvalBindingsA (Bs, 3>,po, <r); 

A"' =A-uLVM; 

in (,*'[*], *', A"', A*>)} 
where 

EvalBindingsA ([[ £1 = ei; . . . ; a; n = e n ] , $, p, <r) = 
{(v 1 ,a 1 ,A- 1 ,A v 1 ) =e A \e 1 }p<J<$>; 

{ v n ,a n , A~ n , A v n ) = S A I e n ] pcr$ ; 

p' = p[(v x U p[x 1 ])/x 1 , ■■■,(v n U p[x n ])/x n ]; 

A-' = UA-,-; 
A^' = UA^; 

/ «" „» A-" A^"\ 

{p ,a ,A ,A } = 

if p' Q p Act' Q a 

then (p',a',A-',A v ') 

else EvalBindingsA ([ ^1 = e i> • • • > ^n = e n 1 , $7 p' 5 o 7 ) 
in(^,a",A-",A^")} 

Figure 4.12: Evaluation of block expressions 
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Sa\ 'MakeTuple(sei, • • -,se m ) \pcr§ = 
{ V! =S£ A { se 1 }p; 

v m = S8 A I se m ] p ; 

^ tuple — [Tuple ^1 1 ' ' ' i ^m) ? 

ol = e : /; 

V tuple = °[° l \\ 

a 1 =a[ol^(v tuple Uv' tuple )]; 
in ( {ol},a', 9,-Ldmap) } 

Sa\ Select, (se) \pa§ = 

{ Is =S£ A { se }p; 

(Tuple n,- ■ -,v m ) = U a[ol]; 

in ( Vi,a,$,L DM Ap) } 
Figure 4.13: Abstract evaluation of tuple primitives 



In the clause for primitive Select,-, note that we took the least upper bound of all the tuples that 
may be referred to by Is, and then returned the ith component of that tuple. We could also have 
selected the ith components of all the tuples to which the set Is refers, and then returned the least 
upper bound of these values. We can see that the two methods are equivalent by examining the 
definitions of least upper bound on values and tuples. 



4.4 Soundness of the Abstracted Interpreter 

This section shows that our abstract interpreter is sound. 
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Theorem 4.2 The abstracted interpreter 8a is extensive with respect to stores. 

\/p G Env, do G Store, $ G FEnv, 
3( V ,a 1 ,A-,A p ) =^[e]Fo$ 

Proof: 

Similar to the proof of the extensionality of the standard interpreter. I 

Theorem 4.3 The interpreter functions SEa an d £a are monotonic. 

Mse G SE, Mpo,pi G Env, 

PoQ Pi => S£ A \se\p Q C S£ A \se}p 1 

Me G E, Mpo,p\ G Env, V<7 ,<7i G Store, V^c^i G FEnv, 

/*oEftA(ToEffiA$oE$i^^[e] poVo<S>o Q £a{ e } /OiCTi$i 

Proof: 

Similar to the proof of the monotonicity of the standard interpreter. I 
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Finally, we require that the abstract interpreter always terminates in a finite amount of time. 
Theorem 4.4 The abstract program evaluator VEa always terminates in a finite amount of time. 



Proof: 



The simple evaluator SEa terminates because it either returns the value of a literal constant 
or else looks up a variable in the environment. 

We use structural induction to show that the expression evaluator Ea terminates in a finite 
amount of time. The expression evaluator Ea has three cases: function application expres- 
sions, letrec block expressions, and all other expressions. 

— The evaluation of function applications takes a finite amount of time because the ex- 
pression evaluator makes a finite number of calls to the simple expression evaluator, and 
then looks up the result of the function application in the function environment. 

— The evaluation of letrec blocks consists iterating to refine the environment and store of 
the body of the block. Each iteration consists of evaluating a finite number of expressions, 
so each iteration takes a finite amount of time (using our induction hypothesis). The 
size of all chains in our domains are finite, so it takes only a finite number of iterations 
for the values of the environment and store to climb to their limits. 

— The evaluation of all other expressions consists of evaluating a finite number of subex- 
pressions and combining the results in some way. Evaluating the subexpressions and 
combining the results each take a finite amount of time. 

Finally, we require the computation of the function environment to take a finite amount 
of time. Each iteration of this process consists of evaluating each function over value in 
the interesting portion of that function's domain. All of our value domains are bounded by 
program size, so the functions' domains must be finite. Evaluation of the body of the function 
uses the expression evaluator, so that must take a finite amount of time. The fixpoint iteration 
used to compute the function environment must also terminate in a finite number of steps 
because the sizes of all domains are finite. 



4.5 Safety of the Abstracted Interpreter 

In this section we show that the abstract interpreter VEa preserves the behavior of the instrumented 
interpreter. 

An abstract interpretation is considered to be safe if the abstraction of a function preserves the 
behavior of the concrete function. 

Definition 4.5 (Abstract Interpretation Safety) Given domains A, A, B, and B , and ab- 
straction functions AbsA '■ A — ► A and Abss : B —* B , we say an abstract function f : A — ► B is 
safe for concrete function f : A — ► B if the following condition holds: 

Ma e A, Va e A. 

Abs A (a) Q A aAbs B (/(«)) Qb f(a) 

If our abstract interpreter is safe by this definition, then it must preserve object reachability. 
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Theorem 4.6 SEa is safe f or SE: 

Vse G SE, \/p G Env, \/p G Env a '■ 

AbsEnv (p)Q[>=> Abs v (SE I se ] p ) C <S£ A [ se ] p 



Proof: 



By structural induction over SE: 

— If se is a boolean, then SE returns either True or False, which abstract to B_, and SEa 
returns B_. 

— If se is a number, then SE returns a number, which abstracts to JV, and SEa returns N_. 

— If se is a variable x, then <S£ returns p[x] and <S£^ returns p[a;]. By our definition of 

AuS£ nv , 

AbsEnv (p)Q[>=> Abs v (p[x]) Q p[x] 



Theorem 4.7 Ea is safe for £j. Given function environment $ for program pr: 

Ve G pr, \/a G AL, \/p G Env, \/p G Env a, Vct G Store, Vct G Store a '■ 
AbsEnv (p) E P A ^l^ssiore (ct) C ct =^ ;4&s (£/ [[ e ] paa ) C ^ [ e ] ^ct$ 



Proof: 



By structural induction over E: 

— If e is a simple expression, then Ei and Ea call the corresponding simple evaluators, so 
Ea is safe for Ei. 

— If e is a primitive arithmetic expression, then Ea returns either N_ or B_, depending on 
its type. These values contain the abstraction of all possible values that Ei could return. 

— If e is a function application, then Ea calls SE a to evaluate the arguments. These 
abstract argument values contain the abstractions of the corresponding calls to SEi 
made by Ei. The function environment for a program maps a set of abstract inputs 
to the most general value a function could return when applied to any concrete inputs 
contained in the abstract inputs. Therefore, Ea is safe for Ei for function applications 
because it looks up the result in the function environment. 

— If e is a conditional, then Ea returns the least upper bound of the evaluation of both 
branches of the conditional. This value is greater than the result of evaluating either of 
the branches of the conditional, so it must contain the value of Ei applied to the branch 
of the conditional that is taken under the instrumented interpreter. 

— If e is a letrec block, then evaluation consists of fixpoint iteration to compute the 
block's environment and store. For each iteration, we take an approximation of the 
environment and store and generate refined approximations. This process is safe, by our 
induction hypothesis, because we call Ea and Ei on the subexpressions to compute the 
contributions to the new approximations to the environment and store. The final result 
must be safe, because each iteration is safe and because both Ea and Ei are monotonic. 
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— If e is a tuple primitive, then the simple expression evaluators are called to evaluate the 
arguments and a tuple is constructed or dereferenced. The object label constructed by 
8a is an abstraction of the object label constructed by £/. Any tuple allocated by 8a is 
an abstraction of a tuple allocate by 8i because each of the components under 8a is an 
abstraction of the corresponding values under 8i. 



Theorem 4.8 V8 a is sa f e for VSi- 



\/pr G Prog, 

Abs {V8i I pr ] ) C V8 A { pr 



Proof: 



In order to show that the abstract program interpreter is safe for the instrumented program 
interpreter, we must show that the abstract interpreter constructs a function environment 
that is safe with respect to the behavior of each of the functions in the program. 

We do this by induction on the depth of nesting of function calls. 

— Base case: The initial function environment <I> maps all functions to input-output tables 
that map all inputs to bottom. <I> is safe for all expressions that do not call procedures. 

— Induction Hypothesis: We assume that function environment $ is safe for the abstract 
interpretation of expressions that expand to a call depth of k. To compute the value of 
the $ fc , the k + 1st approximation to the function environment, we evaluate the body of 
each function applied to each value in the interesting domain of the function using 8a 
and function environment $> k . This yields <j> fc+1 that is safe for expressions that expand 
to a call depth of k + 1, because 8a is safe for £/. 



4.6 Determining Object Lifetimes Statically 

We can apply the abstract interpreter to a program prog to get a value, a store, and a set of 
deallocation events. Consider the example shown in Figure 4.14. The result of evaluating the 
program is a number and a store containing one tuple. If we follow the execution of the application 
of f to 68, we see that its result is 



J-SWefe : /o — ► (Tuple N_,N_)], 



A v ) 

meaning that a number is returned as the result and that a tuple labeled e:/o was allocated during 
the execution of the application. Because the tuple is not reachable from the result we know that 
the lifetime of the tuple ends when the invocation of f ends. With a little more information about 
which identifiers in f are bound to the tuple, we could transform the program so that the storage 
associated with the tuple is reclaimed when f terminates. 
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{ def f(w) = 

{ t = fc °g(w); 
a = Selecti(t) ; 
b = Select 2 (t) ; 
r = (a * b) ; 
in r >; 
def g(x) = 

{ y = (x-21); 

t = '°MakeTuple(x,y) ; 
in t > 
def f O = 
fcl f(68) 

>; 



Figure 4.14: Example with non-nested structures 



{ def f(w) = 

{ tl = fc °g(w); 
w2 = w * 2; 
t2 = fcl g(w2); 
r = (w * w2) ; 
t3 = /l MakeTuple(tl,t2,r); 
in t3 }; 
def g(x) = 

{ y = (x-21); 

t = '°MakeTuple(x,y) ; 
in t >; 
def f O = 
fc2 f(68); 

>; 



Figure 4.15: Example with false sharing 



The example shown earlier in this chapter in Figure 4.5 is slightly more complicated than the 
one we just did. It consists of a recursive procedure, f oo, that allocates a tuple in each recursive 
iteration. We went through the steps of abstract interpretation in detail in Section 4.3.2, obtaining 
the following value: 



-L,SWe[e : /o ~> {Tuple K, K) , C : h ~> {Tuple K,K}], 



A v ) 

One thing we can conclude by examining the program and this result is that both tuples allocated 
are no longer in use at the end of the program, because the program returns a number as its result. 

The example in Figure 4.15 is interesting because it shows how the abstraction of labels can cause 
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apparent sharing of structures, when in the standard or instrumented semantics there actually 
would be no sharing. The result of this program under abstract interpretation is: 



-L Store 

0, 



/0^ (Tuple K,N) 

h — ► (Tuple {lo}, {lo},N) 



That is, the result of the program is the structure contained in locations {/i}, which is a three tuple 
containing a reference to location /q, another reference to location /q, and a number. Note that while 
the tuples allocated by the two calls to g would have been allocated in different locations e.k2-ko:lo 
and e.ki-k\ :/q under the instrumented semantics, they have been assigned the same location under 
the abstract semantics. Thus, any analysis performed using this abstraction of activation labels will 
not be able to distinguish the tuples allocated by distinct calls to a procedure. Chapter 6 discusses 
an improved approximation of activation labels that solves this problem. 

Chapter 5 presents two algorithms that use the abstract interpreter defined in this chapter. The 
first algorithm verifies the safety of deallocation commands in programs, and the second inserts 
deallocation commands into programs. The basic approach is similar to what we have done in 
this section. The compiler uses the abstract interpreter to compute input-output mappings of 
all procedures. Then the compiler processes the body of each procedure, computing the possible 
bindings of all variables in the body of the procedure and using our deallocation safety criteria to 
verify or insert deallocation commands. 



Chapter 5 

Verifying and Inserting Deallocation 

Commands 



We have seen how to interpret KID - programs in such a way as to determine what objects are cre- 
ated by the program (or each procedure in the program) and what objects passed into a procedure 
are reachable from the result of that procedure. We now investigate how to turn our abstract inter- 
preter into an algorithm to verify deallocation commands and an algorithm to insert deallocation 
commands. 

The verification and insertion algorithms compute the function environment for a whole program 
and then operate on each procedure of the program. Both algorithms compute the function en- 
vironment for the program and then recursively traverse the body of each procedure, calling the 
abstract expression evaluator 8a to provide a summary of the value to which each identifier could 
be bound. 

Both the verification and insertion algorithms must analyze procedure bodies with respect to a set 
of arguments provided to that procedure. In this chapter we discuss how the choice of input values 
affects the performance of the analysis of procedure bodies. We also describe how we choose input 
values for use in the verification and algorithms. 

As we did in Chapter 4, we restrict our discussion in this chapter to programs using tuples as 
their only data structure. In the next two chapters we discuss both the additions to the abstract 
interpreter, and the insertion and verification algorithms needed to handle arrays, algebraic types, 
and fists. 

The first section of this chapter presents formal conditions for the correctness of a deallocation 
statement. The second section discusses the choice of input values used during the analysis of 
a procedure and presents an algorithm for choosing these values. The third section presents a 
mechanical algorithm for verifying the correctness of deallocation statements in KID - programs. 
The final section of this chapter presents a simple algorithm for inserting correct deallocation 
statements into a program. 

5.1 Object Deallocation Safety 

Let us consider a deallocation statement in block expression e, shown below, and use the abstract 
semantics to show the conditions under which this deallocation statement can never lead to a run- 
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time error. Expression e is a generic letrec expression containing several deallocation commands. 

e = { x x =e 1 ; 

Dealloc (yi) 

Dealloc (y m ) 
in Xj } 

where the environment, store, and function environment in which e is to be evaluated are p, a, and 
$, respectively. 

We can compute environment p' and store a', the resulting environment and store for the block 
bindings, A - , the set of labels deallocated by the block bindings, and v, which is the result of the 
evaluation of the expression, as shown below: 

p Q = p[±/x 1 ,---,±/x n ] 

{ p', a', A" , A v ) = EvalBindings A (lx 1 = e 1 ;...;x n = e n ],$,po,a,± DM Ap) 

v = p'[xj] 

R = Reachable (v, a') 

I = M Reachable (p'[y], a) 

yeFV(e) 

Consider each variable yi whose value is deallocated in the above block expression. If the value of 
yi is to be deallocated safely upon termination of the letrec block, then the value to which yi is 
bound, p'[yi], must be a reference Is to a tuple. Furthermore, that tuple must have been allocated 
within the execution of expression e. Therefore, none of the labels to which yi could be bound may 
be in the set of labels inherited from the context in which this expression is executed. Also, none 
of the labels in p'[yi\ can be reachable from the result p'[xj] of the block expression. Finally, none 
of those labels can be in the set of objects deallocated by other deallocation commands. 

Condition 5.1 (Deallocation Command Safety) The deallocation statement 

Dealloc (yi), shown in the code fragment above, is safe, if the following three conditions hold: 

1. p'[yi\ fl i" = (yi is not inherited) 

2. p'[yi\ fl R = (yi does not escape) 

3. \/yj ^ yi . p'[yi\ fl (p'[yj] U A - ) = (yi is not deallocated elsewhere) 

If Condition 5.1 is satisfied, then Theorem 3.10 from Chapter 3 applies and it is guaranteed that 
this deallocation command will not lead to dangling pointer errors. 

5.2 Choice of Procedure Arguments 

We need to determine the behavior of a procedure over all possible values to which the procedure 
could be applied in order to verify that the deallocation commands in that procedure are safe or in 
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order to insert safe deallocation commands. One way to do this is to analyze a procedure when it is 
applied to the least upper bound of all the abstract values in the domain of that procedure. Another 
choice of input values would be the least upper bound of the values in the interesting portion of 
the domain of the procedure. We discuss the use of the interesting domain of the procedure in 
this section, discuss how it sometimes prevents us from verifying deallocation commands that are 
actually correct, and then develop a better set of input values for use during analysis. 

5.2.1 Most General Input Values 

Let us consider the analysis of the body of a procedure when applied to the least upper bound of 
the values in the interesting domain of that procedure. For example, let us analyze the procedure 
f oo, defined below: 



def foo 


(w,n) = 


{ a 


= 


Selecti(w) ; 


b 


= 


Select2(w) ; 


c 


= 


a + 1; 


P 


= 


c < n; 


r 




if p then 

{ w' = '°MakeTuple(c,b); 
r' = fco foo(w'); 

in r' >; 
else b; 


in 


r 


>; 



where the interesting portion of the domain of foo might be: 

(-L, -L, -L Store) , 

({h},N,[h^( Tuple N,N}]}, 

({l },N,[h -+ (Tuple K,N),l -+ ( Tuple N,N)}) 

where the tuple labeled l\ was allocated somewhere else in the program. 
The least upper bound of these values is the triple: 

({'0,*l},iL[Zl -+ (Tuple K,N),l -+ (Tuple K, N)]) 

If we examine this program by hand, we see that the lifetime of the object bound to w' is contained 
in the lifetime of the letrec block in the then side of the conditional, because foo never returns 
its input — so w' does not escape — and w' is always bound to a freshly allocated tuple -sow' 
is allocated within the letrec block. 

However, if we apply the procedure foo to the most general input value that we constructed above, 
we find the following possibilities for the bindings of variables w, w' and r' in environment p' within 
the body of foo: 

p'[v] = {/ ,/l} 

p'W] = {h} 
p'[x'] = N 
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The set / of labels that may be inherited by procedure foo is the set {/q,/i}, and the set of labels 
that may escape from foo is empty. Thus, even though we can determine by inspection that w' is 
never bound to an object that escapes from the letrec block, we find that w' may be bound to 
an object inherited by procedure foo. We cannot determine that the lifetime of w' is contained in 
the inner letrec block using this approach to lifetime analysis. 

In this example, we have come to a safe conclusion. It is always safe to overestimate the lifetime 
of an object, but if the overestimates are too large we will never be able to verify or insert any 
deallocation commands. In fact, if we we follow this strategy of using the most general input value 
we analyze a procedure, we will never be able to verify the deallocation of a structure that is created 
by a procedure, passed to a recursive call, but not returned from the procedure. The reason is that 
passing an object label to a recursive call guarantees that the label is in the most general input 
value; therefore, the label will always be considered inherited by the procedure. 

5.2.2 Desired Properties for Input Values 

What are the important properties of the input values to which we apply a procedure during 
analysis? From the standpoint of lifetime analysis, the most important thing we know about these 
values is that they came from outside the procedure, and that if some variable within the procedure 
may be bound to one of these values, then the lifetime of the object to which that variable is bound 
may not be enclosed by the lifetime of the procedure. So we desire to choose input values in 
such a way that we can determine which variables are "contaminated" by input values, without 
getting any spurious contamination signals. We must also choose input values so that we never 
miss any contamination signals. Therefore, we cannot use bottom as an input value when analyzing 
a procedure. 

The input values we choose must also have the right type. We cannot apply a function to a number 
if it expects a tuple. 

Finally, we must be able to show that analysis of the body of a function with respect to some input 
value yields correct values for all possible values to which the function could be applied under the 
standard interpreter. 

5.2.3 Representative Input Values 

In this section we present a method for creating representative input values that allow us to avoid 
the false contamination we found in Section 5.2.1. We show that analysis performed with respect 
to these input values is safe for all possible inputs, up to renaming of the inputs. 

Let us analyze procedure foo when applied to the following representative value v^ ep : 

({l-i},N,[U^( Tuph N,l!Q]) 

where label /_i is a new label that does not occur anywhere in the program. 

Now if we evaluate the body of foo and determine the bindings of w, w' and r' we get the following 
values: 

p'[»] = {/_!} 

p'W] = {/o} 
p'[x'] = N 
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The set / of labels that may be inherited by procedure foo is the set {/-i}, and the set of labels 
that escapes from foo is empty. In this case, the lifetime of the object to which w' is bound is 
contained in the lifetime of the inner letrec block. 



Name Invariance 

The question we must now answer is whether the behavior of foo when applied to this input value 
tells us anything about the behavior of foo when applied to other values. If we want to determine 
the behavior for an input vector that contains /,- instead of /_i, such as the following input vector 
v: 

({/,-}, K,[h^(Tu P ,eK,N), •••]), 

then we can take the bindings computed for foo applied to input vector v rep , rename all occurrences 
of /_i to li, and end up with the bindings for foo applied to l_\. For instance, we rename /_i to 
li in our analysis of foo with respect to the representative value v^ ep , we obtain the following 
information about the environment in the body of foo: 

p'[v] = {/,-} 
p'W] = {/o} 
AA = N 

which is exactly the result if we directly analyzed procedure foo applied to v. 

If more than one location was passed as an argument, then we substitute the set of labels for the 
set containing /_i, and duplicate the bindings of /_i in the store for each of the labels in the desired 
input value. 

We would like to show that we can take an appropriate representative input value, analyze a 
function with respect to that input value, and determine safe behavior for that function applied to 
any input value given the behavior of the function over the representative input value. In order for 
the behavior of a function over its representative input value to tell us anything about the behavior 
of the function over other input values, the representative input value must satisfy the following 
three conditions: 

1. The representative value must have the same type as all other input values to this function. 

2. None of the values reachable from the representative input values may be bottom. 

3. The labels used in the representative input value must be distinct from all labels occurring 
statically in the program. 

The first condition is almost redundant; all values to which a function is applied must have the 
same type. 

The second condition is required because the result of a function applied to any value that is in 
some way "less than" the representative input value will be less than the result of the function 
applied to the representative input value. If some component of the representative input value 
is bottom, then any input that has a non-bottom value for that component will cause different 
behavior. 

The third condition is required because we would like to be able to perform a substitution on the 
result of a function applied to the representative input value in order to obtain the result of the 
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Substy (Is, l ,N_) = N_ 

Substy (ls,l ,B_) = B_ 

Substvdsl Us'Y) - { ( /s '-iU) u/s if lot Is' 

bubst v {ls,l ,{ls\) - < h , otherwise 



bubst Ls {ls,l ,{lsM-<^ u , otherwise 

SubstEnv (Is, h, P) = ^X. Substy (Is, l , p[x\) 

Substsy (Is, l , {Tuple n,- ■ -,V m )) = (Tuple Substy (Is, l , V r ), ■ ■ ■ Substy (ls,l ,V m )) 

\ Subst S y(ls,l ,a[l ]) if I els 
I Substsy (ls,l ,a[l\) otherwise 

SubstfEnv (ls,l ,&) = 

U I U L FEnu [fi ~+ [Subst (IS, l , V) -+ Subst (Is, l , $[/,"] [^)]] ) 

u \ss.t.$\jm£L i 

Subst* ({( ls 1 , h },■■■,{ ls n , l n )},v) = 

Subst* ({{ ls 1 , h ),•••,( /sn-i, L-i )}, Subst (ls n ,l n ,v)) 
Subst* ($,v) = v 

Figure 5.1: Definition of procedure Subst 



function applied to some other value. We will not get a correct result if we rename the objects 
allocated within the procedure call; we only want to rename or substitute for values passed as input 
to the function. After all, the most important thing we know about the representative input values 
is that they came from outside the procedure. 

We define procedure Subst in Figure 5.1. This procedure takes a set of labels Is, a label l , and a 
value v, and substitutes Is for all occurrences of label l in v. We define version of Subst to operate 
on denotable values, storable values, environments, stores, and function environments. 
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Let 7UV be a procedure that takes a function's type and constructs a representative input vector 
for that function that satisfies the three conditions given above. Let Match be a procedure that 
takes an input vector iv to a function and a function's representative input vector riv, and produces 
a substitution such that 

iv C Subst* (9, riv). 

Theorem 5.2 (Name Invariance) Given program pr and the function environment $ for that 
program: 

V/i G PJ, Bra = 7^ZV[ TypeOf (f % ,pr) ]^, 

Vit) G Domain (fi), 39 = Match (iv, riv), 

iv C 5u6s<* (0, to) => $[/,-][««] E 5u6s<* (0, $[/,■] [ra;]) 

Proof: 



Sketch of Proof by Contradiction: 
Let: 








^ iv 


= $[/,■][*«] 




T 'riv 


= $[/,-] [nv 



Assume that: $[/;][«?] g Su&sf* (0, $[/i][ra]) 

Every portion of the results of function applications, in this case rj v and r^„, either came 
from the input to the function or was created within the function. 

— The representative input vector riv is at least as well defined as the input vector of 
interest iv, and so execution of the body of the function should have proceeded at least 
as far in the case of riv as in the case of iv. Therefore, all portions of the result should 
be at least as well defined. For that reason, all portions of the result that came from 
the input should be at least as well defined in the case of Subst* (0,r^„) as in the case 
of rj v , unless some of the inputs reachable under one case were not reachable under the 
other. But the abstract interpreter preserves reachability, so all of the components of 
Subst* (9,rriv) that were inherited from the input vector riv must contain the portions 
of the result rj v that came from the input iv, because input Subst* (9, riv) contained all 
of input iv. 

Contradiction. 

— Again, all code in the body of function /,- must have executed at least as far when 
applied to riv as it did when applied to iv, because riv is more defined (has more non- 
bottom components). Therefore all portions of result r^„ that were created within the 
function body should be at least as well defined as the portions of result rj v that were 
created within the function body. Furthermore, all of these values that are object labels 
must be the same in both cases, because all object labels depend solely on the text of 
the program, not the inputs to the function. Therefore, it must be the case that the 
portions of result r^„ that were created within the function must contain the portions 
of rj v that were created withing the function, even before substitution. Furthermore, 
none of the labels being renamed by substitution are created within the body of the 
function — the contract of 7UV is to use labels from outside of the program. 
Contradiction. 
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Both of these paths lead to contradiction, so our assumption must be false. 



There remains the question of determining if label Iq can ever, under the concrete interpreter, both 
be allocated within the inner letrec block and be inherited by the body of the same instance 
of procedure f oo. The only way an object can be allocated within an activation of a procedure 
procedure and passed into the same activation of the procedure is if the object is returned as 
part of the result and the caller of the procedure passes that value back into the procedure as 
an argument. Under this condition, the object's lifetime cannot be bounded by the lifetime of 
the procedure, because the object escapes as part of the procedure's result. If the object is never 
returned as part of the result, then any object passed into the procedure with the same label as an 
object allocated within the procedure must be an instance of an object allocated within a different 
activation of the same procedure. 

Theorem 5.2 has two consequences. First, it allows us to construct a representative input value for 
each function and analyze the function applied to that representative input in order to verify or 
insert deallocation commands in the body of the function. It shows us that the representative input 
vector is equivalent to the most general input vector in the most important way: distinguishing the 
values that came from outside the function from those that were created within the function. The 
use of representative input vectors in many cases allows us to avoid the false aliasing problem that 
reduces the effectiveness of the deallocation safety verification and deallocation insertion algorithms. 

Second, it allows us to derive a conservative approximation of the result of a function applied 
to a particular input value from the result of the function applied to the representative input. 
Theorem 5.2 guarantees that if the representative input is chosen appropriately, then the result 
after substitution will be an approximation of the actual result. 

Two problems remain: how to choose the representative input for a given function and how to 
choose substitutions. 



Constructing Representative Input Values for Functions 

Our approach is to generate an input value for each procedure based on its type so that if we 
analyze or transform the function for that input value the result will be correct for all input values. 
This allows us to ask more precise questions about the function because we can guarantee that 
there is no false aliasing between the inputs to the function and any structures it may allocate. 

The type of a function is a member of the domain FunctionType, defined below. Argument and 
result types of a function are drawn from the domain Type. 

FunctionType = (Type X • • • X Type)—*Type 

t e Type = N \ B \ ( Tuple Type X • • • X Type) 

The function TUV takes a function type and returns a representative input value: a tuple of abstract 
values and an abstract store. This procedure makes use of function CV, which constructs a single 
value-store pair from a single type. The signatures of functions TUV and CV are given below: 

TUV : FunctionType —> (V X • • • X V X Store) 
CV : Type -► (V X Store) 
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Finally, we need a function, TypeOf, which gives us the type of a procedure in the program pr. 
TypeOf takes a function identifier and a program and returns the type of that function. 

TypeOf : F—^Prog—^FunctionType 

The procedure CV, in the case of a scalar type, returns N_ or B_, as appropriate, and the empty 
store. In the case of a tuple type, CV is called recursively on each of the component types, a new 
label / is allocated, and the set containing / is returned as the value. The resulting store is the least 
upper bound of each of the component stores with location / bound to the tuple of the component 
values. 

CV\N\ = 
CV{B\ = 

CV\ {Tuple n, ■ ■ -,T n ) ] = 



N_, -L Store ) 




B_, -L Store ) 




{ ( n, o-i > = 


= CV [ Tx 


( v n , a n ) = 
a' 


= CV\T n 



/ = Newloc (); 

in ( {/}, a'[l -r ( Tuple v 1 ,---,v n }] ) } 

We call function Newloc to give us a label that cannot appear in the program — this guarantees 
that we do not get any false aliasing between the initial arguments to a function and the object 
labels allocated within the function. 

The function 1ZTV takes the tuple of function argument types and calls function CV to construct a 
value and store for each of those types. It returns a tuple containing each of the values and the least 
upper bound of the stores. Note that because we construct these stores with disjoint locations, the 
least upper bound of these stores is the same as the concatenation of the stores. 

TUVUriX---xr n )^r r ] = { ( Vl , o x ) = CV [ n 1 ; 

( v n , a n ) =CV\ T n ] ; 
o' =\_\< J f, 

i 

in (v 1 ,---,v n ,(T l ) } 

5.3 An Algorithm for Verifying Deallocation Commands 

This section defines VP, an algorithm that takes a program pr and returns a set of the expression 
labels of possibly incorrect deallocation commands in the program. This algorithm errs on the 
conservative side, returning labels of deallocation commands that may never cause dangling pointer 
errors at run-time, but it never returns the empty set for programs that are incorrect. 

Procedure VP verifies programs using monotonic reasoning: it first assumes all procedures are 
correct and iteratively improves this approximation until it finds all the procedures that could 
have dynamic errors deallocating structures. We use a new mapping — a correctness map — that 
gives the most up-to-date information about the correctness of each procedure. A correctness map 
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takes a procedure name and returns either if the procedure contains only correct deallocation 
commands or a non-empty set of incorrect deallocation command labels otherwise. 

$ G CMAP = F->V(L) 

The function VS verifies that all deallocation commands within an expression are correct with 
respect to a given environment, store, and function environment. 

Here are the signatures of the verification procedures: 

VP : Prog -> V(L) 

VS : E->Env->Store->FEnv->CMAP->V(L) 

These functions are defined in the following two sections. 

5.3.1 Verification of Deallocation within a Program 

The function VP verifies that all of the deallocation statements in the main body of the program 
and in each of the functions of the program cannot lead to dereferencing a deallocated location 
under any execution of the program. The definition of VP is shown below: 

VP\pr\ = (5.1) 

{ I {■■■fi(x 1 ,---,x n ) = e;--- ine} ] = pr; 
( $0, Vo ) = InitialFEnv (pr); 
( $, V ) = ComputeFEnv (pr, <J> , V ); 

*o[/,-] =0, V/,-; 

$ = ComputeCMAP (pr, <$>,T>, $ ); 

(Is =VS{e }± E nvi-Store^^; 

in ds } 

where expression eo is the body of the main function fo- Procedure VP calls procedure Com- 
puteFEnv to compute the function environment and the interesting domain map of the program. 
Then it calls ComputeCMAP to compute the correctness map for the program. The CMAP $ 
takes function names and returns the list of expression labels of deallocation commands that may 
be incorrect. Finally, procedure VP calls procedure VS to verify the correctness of expression eo, 
the body of the main procedure /q. If there are no incorrect deallocation commands that may be 
called from the main body of the program, then all deallocation commands in the program must 
be correct (or else unreachable from the main procedure). 

We revise the function InitialFEnv that takes a program and returns a function environment and 
a domain map. The empty function environment is returned as the initial function environment. 
The domain map we return maps each function name to the set containing the representative input 
value for that function. 

InitialFEnv (pr) = 
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{ $0 = -i-FEnv', 

V/.GF: 

Tfi = TypeOf(fi,pr); 

(vi,i ,■■■, v hn , a t ) = 72ZV [ r ft ] ; 
^o [fi] = {(vi,i , • • • , v hn , a,-) } ; V fi e F 

in ( $ , ^o > } 

The function ComputeCMAP iteratively improves the approximations of the correctness map until 
no further information is added. It returns the most precise approximation as its result. 

ComputeCMAP (pr,$,V,V) = (5.2) 

{ *' = ^fi-Uivu-^n^eVUiD^l e i } -LEnv[n/xi,- ■ ■ ,v n /x n ]a^^ ; 
*" = if *' C $ 

e/se ComputeCMAP (pr, $, D, *'); 
in $" } 

5.3.2 Verification of Deallocation within an Expression 

This section gives a definition of algorithm V£, which takes an expression e, an environment, a 
store, and a function environment, and returns the set of labels of deallocation commands in e that 
cannot be proven safe statically. 

The following four clauses of VS show that VS returns the empty set for simple expressions and 
primitive expressions because none of these expressions can deallocate objects. 

V£{n\po-m = (5.3) 

V£{b\po-m = (5.4) 

V£{x\po-m = (5.5) 

V£l+(se 1 ,se 2 ) 1 lp(T<S>y = (5.6) 

Function VS looks in the correctness map $ to see if an application of procedure / is correct with 
respect to deallocation. 

V£lf{se 1 ,---,se n )\paW = *[/] (5.7) 

Verification that a function has correct deallocation statements only is performed by procedure VP, 
which tests the correctness of each function over all points in the function's domain. 

Conditional statements have correct deallocation statements if both branches of the conditional 
are correct. The predicate cannot have any deallocation statements. The following clause verifies 
conditional expressions. 

VS\ if(se ,ei,e 2 ) JpaQV = (5.8) 

VS I ei ] po-<5>^ U VS [ e 2 ] pa^ 

The essence of the verification procedure for expressions is in the clause shown in Figure 5.2. 
This clause verifies the deallocation commands in letrec blocks. This clause must compute the 
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VS I { Bs—Ds in x} ] po<5>^ = 

{ I x x = e x ;...;x n = e n ] = Bs; 
I dl Dealloc (y x ); ■ ■ ■ ; ^Dealloc (y k ) ] = Ds; 

p = p[±/x x ,---,±/x n ]; 

{ p', a', A" , A v ) = EvaWindingsA (Bs, $,p , a, -Ldmap); 

I = M Reachable (p[w], a); 

wEFV(Bs) 

R = Reachable (x, a'); 



ds Bs = U W I e « 1 P' a ' m 



Ki<n 



ds Ds = IJ 

[ d »Dealloc(|/ 8 ) ]eDs 

in ds Bs U ds Ds } 



when 

= //fe] n / 
a 9 = p'[y t ]nR 



"(/[ft] u A- 



{d,-} otherwise 



(5.9) 



Figure 5.2: Clause to verify deallocation commands in letrec blocks 



environment p' and store a' of the letrec block bindings. It calls VS on each of the right-hand-side 
expressions with respect to p' and a', collecting the results into dss s - Then, it checks whether 
each deallocation statement labeled d{ in Ds is safe. If VS cannot prove that the deallocation 
statement labeled d{ is safe, it collects the d{ into the set ds£> s . The result is the union of the unsafe 
deallocation statement labels in the body of the block and the unsafe deallocation statement labels 
in Ds. While computing set /, the set of object labels reachable from the surrounding context of 
a block expression, note that we use the incoming environment and store: p and a, instead of the 
current environment and store: p' and a'. We can use either a or a' here because the language is 
functional. 

Procedure VS returns the empty set for tuple allocation and selection primitives, as shown below, 
because they cannot contain deallocation statements. 



VS\ 'MakeTuple(sei,- • -,se m ) \pa^^l 
VS{ Select, (se) ]p<7$$ 



(5.10) 
(5.11) 



5.3.3 Verifying Some Examples 

Now let us apply the above algorithm to both a correct and an incorrect example so that we may 
observe its operation. 
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A Correct Example 

In this example, we apply procedure VP to pr, the following KID - program: 

{ def f(x,y) = 

{ t = '°MakeTuple(x,y) 
result = fc °g(t); 

dl Dealloc(t); 
in result }; 

def g(t) = 
Selecti(t) 

def f O = 
kl f (6,847) 
> 

Here are the domains of the abstracted versions of f and g. 

Domain (f ) = N X N X ± store 

Domain (g) = {0, {/ }} X {[/ -+ ( T u P h N, N)] , [/ -+ J-]} 

where the following are the program dependent domains: 

OL = {/ } 

Ls = {0,{/ o }} 

In order to verify the correctness of the deallocation commands in program pr, we must verify the 
deallocation commands in each procedure in the program for all input vectors in the domains of 
those procedures (Equation 5.1). For expository purposes, we short cut the iterative computation 
of the correctness map $. 

First, we verify the correctness of procedure f . There is only one input vector of interest for 
procedure f , so the value of $[f ] is the result of VS applied to the body of f and this input vector. 

tf [f] = VSle f } ±EnAK/^K/y}^ Stored 

e/ = { t = '°MakeTuple (x,y) 
result = fc °g(t); 

dl Dealloc(t); 
in result }; 

First, let us compute the value of dsB s , the labels of the incorrect deallocation commands in the 
body of the letrec block. We find: 

ds Bs = VS{ '°MakeTuple(x,y) ] pV#* 
U V£ I fc °g(t) ] pVf$ 
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by Equation 5.9, where ej is the body of procedure f and p', a', A~ , /, R and ls t are computed 
by procedure VS: 



p' = ^En 



a' 



iV/x, 

K/y, 

{/o}/t, 
iV/result 

-L Store [io — ► (Tuple jV,iV)] 
A"' = 

/ = Reachable (p'[x], a) U Reachable (p'[y], <r) = 
i? = Reachable (//[result], <r') = 
ls t = p'[t] = {e : / } 

Using these values, we can check the correctness of the two expressions from the body of the block 
expression. There are no incorrect deallocation commands in the MakeTuple expression, so VS 
returns the empty set. To apply VS to the application of procedure g, we must compute the entry 

*[</]• 

If we apply procedure VS to the body of procedure g, VS returns the empty set because there are 
no deallocation commands or procedure applications in the body of g. Consequently, the entry in 
$ for g contains the empty set. 

*[g] = 

Going back to the verification of the body of procedure f , we find: 

VS[ ,0 MakeTuple(x,y) ] pV#* = 

VS I k °g(t)j p'a'M = y\g] 

by Equations 5.10 and 5.7. Therefore, the bindings of the letrec block in f contain no unsafe 
deallocation commands. 

Now let us consider the deallocation command in the body off. Using the values computed above, 
we see that the set of labels that may be deallocated, {/o}, has a null intersection with both / and 
R, which are the sets of inherited and escaping locations. Therefore, this deallocation command 
satisfies the safety condition, so we can conclude that it will never lead to a run-time error. 

Since both dss s and ds£> s are empty, the result of the call to VS on the body of f is the empty set, 
and the entry in $ for f also contains the emptyset. 

*[fl = 



An Incorrect Example 

Now let us apply the verification algorithm to a program containing an incorrect deallocation 
command. The program is a slight variation of the program from the previous example, in which 
procedure g returns its argument. We apply VP to the following program pr: 

{ def f(x,y) = 

{ t = /o MakeTuple(x,y) 
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result 



k 



g(t); 



dl Dealloc(t); 
in result }; 

def g(t) = t 

def f O = 
kl f (6,847) 
> 

Again, we verify the correctness of the deallocation commands in the program by verifying each of 
the procedure bodies over each input in the domains of the function. The procedures in program 
pr have the same domains as in the previous example. 

Procedure g still contains no deallocation commands or procedure applications, so VS returns the 
empty set when applied to the body of g. Therefore, the entry in $ for g is the empty set. 

*[g] = 

We proceed to verify the safety of the deallocation commands in procedure f. There is one input 
value in the domain of f to consider. We call VS on expression ej and input value {N_,N_, -L store)- 

VSfef }\N/x,Njj]±. store** 

where ej is the body of procedure f : 

e f = { t = '°MakeTuple(x,y) 



; / 



result = fc °g(t) ; 



dl Dealloc(t); 
in result }; 

To apply VS to the body of procedure f , we first compute p' and a 1 : 

" N/x, 



P 



N/7, 

{/o}/t, 

{/o}/result 

[l ^ (Tuple N,N}] 



A" 



Then we compute dsB s , the labels of the incorrect deallocates in the bindings of the letrec block. 

ds Bs = VS{ '°MakeTuple(x,y) ] pV#* 
UV^[ fc °g(t) ]pVM 

by Equation 5.9. Using these values, we can see that the deallocations in the bindings of the letrec 
block are correct, as in the previous example. Then we compute the other values needed: 



/ = Reachable (p'[x], a') U Reachable (p'[y], cr') = 
R = Reachable (//[result], a') = {e : /q} 
ls t = p'[t] = {l } 
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If we consider the deallocation command labeled d\, we see that set of locations it may deallo- 
cate, {/o}, intersects the set R of locations reachable from the result of the letrec block. This 
deallocation command violates the safety condition — it may lead to a dangling pointer error at 
run-time — so VS returns the set containing d\ for the letrec block. Consequently, the entry in 
map $ for procedure f is {d\}. 

*[f] = R} 



5.4 An Algorithm for Inserting Deallocation Commands 

This section describes a simple algorithm for inserting correct deallocation commands into KID - 
programs. This algorithm only deallocates objects that are directly named in the control region 
that bounds the lifetimes of the object. To be more complete, the algorithm would have to insert 
bindings from new variables to the nested components of dead structures in order to deallocate 
them. The details of this are left to the reader. 

First, we look at the transformations we expect the deallocation insertion algorithm to perform. 
Then in the next four sections we develop the actual algorithms for inserting deallocation code. 

5.4.1 Desired Results of Insertion Algorithm 

Let us look at a few examples. In the following code fragment, we should be able to determine 
that variable x% can name the same objects as x\ and X2, but that x\ and xi must be bound to 
different objects. Therefore, the best transformation would be to deallocate x\ and xi but not x^, 
as shown: 

{ xi = ^MakeTupleO^); 

{x.-'.tofc.Tnpl.O.W, X2 = <>«a*,T»ple<3,4); 

j , X3 = If p then xi else x 2 ; 

x 2 = /2 MakeTuple(3,4); ___ v 

x 3 = If p then xi else x 2 ; _ . . , . 

DeallocCxi) ; 

Dealloc(x2) ; 
in 7 > 

There is another correct way to transform this program. We could have inserted a deallocation 
command for identifier £3 instead of the commands put in for identifiers x\ and £ 2 , as follows: 

{ xi = ^MakeTupleO^); 
x 2 = ' 2 MakeTuple(3,4); 
X3 = If p then xi else x 2 ; 

Dealloc(x3) ; 
in 7 > 

This transformation is not as good as the previous one because it only deallocates one of the tuples 
that are allocated when both could be deallocated. When inserting deallocation commands, we 
should try to find as many variables that are bound to non-overlapping sets of labels as possible, 
and insert deallocation commands on these variables. 
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It is not always possible to insert deallocation commands that deallocate all dead structures if we do 
not insert conditional deallocation commands. In the following example, we can insert deallocation 
commands for x\ and X2, but not X3, because it may be bound to the same tuple as x\. 

{ xi = ^MakeTupleO^); 

; , x 2 = ' 2 MakeTuple(3,4); 

{ xi = ^MakeTupleO^); T ^ ^ 

1 , X3 = If p then xi 

x 2 = ' 2 MakeTuple(3,4); A y , v , 

1 v ^ else ' 3 MakeTuple(68,47); 

X3 = If p then xi => 

else ' 3 MakeTuple(68,47); ~~~ „ N 

„ -, DeallocCxi) ; 

in 7 > 

Dealloc(x2) ; 

in 7 > 

However, if we insert a conditional deallocation command, then we can deallocate all of the tuples 
that are allocated, as shown below: 

{ xi = '!MakeTuple(3,4); 
x 2 = ^MakeTupleO^); 
X3 = If p then xi 

else ' 3 MakeTuple(68,47); 

Dealloc(xi) ; 
Dealloc(x2) ; 
if (X3 7^ xi) then 
{ — 
Dealloc(x3) } 
else { }; 
in 7 > 

In fact, we can always take the set of all of the variables in a letrec block which are bound to 
objects whose lifetime is definitely contained in the lifetime of the block, and insert conditionals 
to guarantee that each distinct object to which the variables are bound at run time is deallocated 
exactly once. 

Yet another way we can transform this example is to insert a call to copy on the true side of the 
conditional, so that the object bound to x% is always different from that bound to variable x\. This 
may not make sense in this particular case, because it costs more to allocate an object than to 
perform an equality test (as we did in the previous transformation of this example). Inserting a call 
to copy makes sense if this expression is executed many times and it is much more likely to take 
the else branch than the then branch of the conditional. Then the amortized cost of the extra 
copy will be much less than the cost of the conditional before the deallocation command. 

{ xi = '!MakeTuple(3,4); 
x 2 = ^MakeTupleO^); 
X3 = If p then 
fcl copy(xi) 
else ' 3 MakeTuple(68,47); 

Dealloc(xi) ; 
Dealloc(x2) ; 
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Dealloc(x3) ; 
in 7 > 



The following example shows that we may have to insert code in order to name all of the objects 
that may be deallocated. We can insert a deallocation on variable x directly, but we must insert 
a binding to name the second component of the object named by x, which is also a structure that 
may be deallocated in the outer block expression. 



{ x = { 



y = 

z = 

in z 



in 7 > 



{ x 



^MakeTupleO^); 
' 2 MakeTuple(4,y); 



{ y = ^MakeTupleO^); 
z = ' 2 MakeTuple(4,y); 
in z } 
Select2(x) ; 



Dealloc(x) ; 
Dealloc(w) ; 
in 7 > 



5.4.2 The Algorithm 

This section presents a simple algorithms for inserting deallocation commands in KID - programs. 
This algorithm only inserts commands to deallocate tuples that are named by variables in the 
program. It does not insert bindings to name components of tuples whose lifetimes are bounded by 
that of the block. It also does not insert conditionals after the barrier. Once the basic algorithm is 
understood it is straightforward to increase its effectiveness by having it insert code to deallocate 
the components of dead structures and insert conditional deallocation commands to deallocate all 
structures bound to identifiers that may be aliased. 

The algorithm works in a greedy fashion on the set of identifiers bound in a block. It inserts 
deallocation commands for each identifier that is bound to a set of labels that satisfies these three 
conditions: 

1. The lifetime of each of the labels in the set is enclosed by that of the block. 

2. None of the labels are deallocated by one of the deallocation commands inserted earlier. 

3. None of the labels are deallocated elsewhere in the program. 

This algorithm is implemented by four procedures: TV and T£, which transform programs and 
expressions, and VS and VX which return a list of deallocation statements for lists of bound 
variables and single bound variables in a given letrec block. 

Here are the signatures of procedures used in the insertion algorithm: 



TV 
T£ 
VS 
VX 



Prog—fProg 

Exj)— fEnv—f Store— fFEnv—fExj) 
X*— fEnv— f Store— fLs—fLs—fLs—fDS 
V->X->Ls->Ls->Ls->(DS X Ls) 



Procedure TV takes a program, computes the function environment for the program, and calls 
procedure TE to insert the appropriate deallocation statements in the body of each procedure in 
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TV{pr\ = (5.12) 

{ I {■■■fi(x 1 ,---,x n ) = €,-■■}} =pr; 

V = InitialDMAP (pr); 

( $, V ) = ComputeFEnvA (pr, -LfEuv^o); 



v/i e {fo,---,fk}, 

Tfi = TypeOf(fi,pr) 



u i,l 5 



i L/?. . r?. i & ? / 



^vi^ 1; 

7£ [ e; ] ±Env[vi,i/x 1 , ■■■, v t)n /x n ]a t $ ; 



in I {■■■ft(xi,---,x n ) = e'i ■■■}}} 
Figure 5.3: Procedure to insert deallocation commands into programs 



the program. Procedure TE takes an expression e and the most general environment, store, and 
function environment in which that expression executes. It returns a transformed expression e' . 

The procedures VS and VX are used by procedure TE when translating letrec blocks. Procedure 
VS takes the set of variables bound by the letrec block, and the environment and store that are 
active in the bindings of the letrec block. In addition to the environment and store, it takes the 
set of inherited, escaping, and previously deallocated object labels. The inherited labels are those 
that are reachable from the context of the letrec block, the escaping labels are those reachable 
from the result of the letrec block, and the previously deallocated labels are those deallocated in 
the bindings of the letrec block. Procedure VS returns the set of deallocation commands for the 
identifiers that are determined to be safely deallocatable. 

Procedure VS calls procedure VX on each bound identifier. Procedure VX is the procedure that 
actually generates a deallocation command for an identifier x when it is safe to deallocate the 
value of that identifier in a particular context. If procedure VX is applied to a variable x, and 
the deallocation safety condition is met for x, then VX returns a deallocation command for x. 
Procedure VX takes as input the binding of x, the variable x, and the sets of inherited, escaping, 
and previously deallocated object labels, and returns a set of deallocation commands and the set 
of object labels that would be deallocated by those commands. 

5.4.3 Inserting Deallocation Commands in Programs 

Procedure TV, shown in Figure 5.3, takes a program pr and returns a new program with deallocation 
statements added to the bodies of each of the procedures in pr and the main expression of pr. 
Procedure TV calls procedure TE on each function body and the main expression of the program 
with the most general environment and store in which those expressions could be evaluated. It 
then reassembles the transformed expressions into a new program pr' . 
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5.4.4 Inserting Deallocation Commands in Expressions 

The procedure TE, which inserts deallocation commands into expressions, takes an expression, an 
environment, a store, and a function environment, and returns a new expression and the new set 
of labels that are deallocated during the execution of the new expression. 

Procedure TE does not insert any deallocation statements in simple expressions and primitive 
expressions. The clauses shown below handle the processing of these expressions. The result 
returned from the function TE is a syntactic expression. These values are surrounded with syntax 
brackets, i.e., [ x ], to show that they are new program text. 

TE \ n ] pa§ = [ n ] where n is a number 

TE \ b ] pa§ = [ b ] where b is a boolean 

TE [ a; ] pc$ = [ a; ] where a; is a variable 

T£{+{se 1 ,se 2 )\pcj<$> = [+(sei,se 2 )] 

As shown below, no changes are made to function application expressions. All changes will be made 
to the body of the function /,- when it is transformed. 

T£l k f(se 1 ,---,se n )}p<r$ = l k f(s ei ,---,se n )} 



Procedure TE processes conditional expressions by generating a new conditional with both branches 
transformed, as shown below: 

TE{ if (se ,ei,e 2 ) \po$ = 

{K } = Teie 1 }pa$; 
le' 2 } = TEle 2 }pa<S>; 

in[[ if (se ,ei,e' 2 ) ]} 

No changes need to be made to tuple manipulation primitives: 

TE{ 'MakeTuple(sei,- • -,se m ) \pa§ = \ 'MakeTuple(sei, • • • , se m ) ] 
TE\ Select, (se) }pcr$ = [ Select,- (se) ] 

As in the procedure for verification, the processing of letrec blocks is where most of the work 
is done during program transformation. First, the environment, store, and set of object labels 
deallocated by the let block must be computed. Then new binding right-hand-sides must be 
generated by transforming the old bindings. Then the set of labels reachable from the result must 
be computed. A new set of deallocation statements is generated by calling procedure VS with the 
set of identifiers bound by the letrec block, the environment, the store, and the sets of reachable, 
allocated, and deallocated labels. Finally, the new right-hand-side expressions and deallocation 
statements are assembled into a new letrec block and returned. The definition of this clause is 
shown in Figure 5.4. 

Procedure VS takes a list of the identifiers of a letrec block, the environment, and store of the 
body of the block, the set of labels of objects reachable from the context of the block, the set of 
labels of objects reachable from the result of the block, and the set of labels deallocated by the 
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T£ I { Bs—Ds in x} ] pa$ = 

{ I x x = e x ;...;x n = e n ] = Bs; 
I Dealloc (y x ); • • • ; Dealloc (y k ) ] = Ds; 

p = p[L/x 1 ,---,Llx n ]; 

(p',a',A-,A v ) = 

EvalBindingsA (Bs, $, po, a, -Ldmap) 



e' n J =Teie n }p'a'$; 

> • • • > ^n °n Jl j 



lBs'\ = [x 1 = e' 1 - •- --' 



A"i =/o'[yi]; 

A"fc =p'[yit]; 

A"' =A-UU,A-,; 

/ = M Reachable (p[w], a); 

wEFV(Bs) 

R = Reachable (p'[x], a'); 

{Ds'} = VS{x 1 ,---,x n }p'o'IRA-' 
in |[ { Bs'—Ds;Ds' inx}]} 

Figure 5.4: Clause to insert deallocation commands in letrec blocks 

block bindings. It returns a deallocation statement for each bound identifier whose value satisfies 
Condition 5.1, and the set of labels deallocated by those deallocation commands. It calls procedure 
VX on each identifier. Procedure VX generates a deallocation command for each identifier that 
satisfies the safety condition. 

VS I x\, ■ ■ ■ , x n ] paIRA~ = 

{({Ds 1 },A- 1 )= VX(p'[x 1 }) I x x 1 IRA- ; 



ir(a~u ( (J a- 

V \i<n 



([Ds n ], A- n ) = VX(p'[x n ])[x n ] 
in {Ds Vl ----Ds n \} 

5.4.5 Generating Deallocation Statements 

Procedure VX takes the value of a bound variable X{, the variable X{, and the set of labels inherited 
by, escaping from, and deallocated by the surrounding letrec block. It returns a deallocation 
command for X{ if it is safe to deallocate the value of X{ upon termination of the surrounding 
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letrec block. The value of X{ may be safely deallocated if X{ is bound to a reference to a structure 
and that structure is allocated within the current letrec block, is not reachable from the result of 
that letrec block, and cannot be deallocated by any other deallocation command. 

VX(±){x}IRA- = ([ 1,0} 
VX(N)lx}IRA- = (I 1,0) 
VX(B)lx}IRA- = (I 1,0} 

VX{ls) { x I IRA~ = if = Is n I 

A = Is n R 

A = ten A~ 
then ( [[ Dealloc (x) ] , Is ) 
else ( H , > 

The final clause of VX actually inserts all of the deallocation commands. The set of object labels 
to which x may be bound is Is, the set of locations passed into the current expression from the 
surrounding context is /, the set of locations reachable from the result of the expression is R, and 
the set of locations deallocated elsewhere is A - . A deallocation is only inserted if Is is disjoint 
from /, if Is is disjoint from R and if Is is disjoint from A - . If these three conditions are met, then 
a deallocation command is returned, along with the set Is of locations it may deallocate. 

A more aggressive algorithm for inserting deallocation commands would examine the contents of 
any tuple it might deallocate to see if any of its components was also a structure that could be 
deallocated. If so, then more deallocation commands could be inserted, along with corresponding 
bindings of new variables to selection expressions in order to name the appropriate tuple compo- 
nents. 

We do not present a more aggressive algorithm here. A more aggressive algorithm is basically the 
same as the one just discussed but augmented in places to track more information and to generate 
more complicated deallocation code. In Chapter 10, we discuss the deallocation command insertion 
algorithm that we implemented. 

5.4.6 Transforming Some Examples 

In this section we apply function TP to an example to see the process of inserting deallocation 
commands. We will walk through the transformation of the example from Section 5.3.3 with the 
deallocation statement removed. For reference, here is the text of the modified program pr: 

{ def f(x,y) = 

{ t = /o MakeTuple(x,y) 

result = fco g(t); 
in result }; 

def g(t) = 
Selecti(t) 

def f O = 
kl f (6,847); 
> 
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First, we determine the types of f and g in program pr: 

TypeOf(f,pr) = (N x N)->N 
TypeOf{g,pr) = ( Tuple N,N)^N 

We use these types to construct representative input vectors: 

1ZEV l (N X N)^N ] = (N,N,L S tore) 
mVU( T u P IeN,N)^N} = ({/-l},[/-l -^ (Tuple N,N)}) 

Procedure TP (Equation 5.12) computes the function environment for the program and then con- 
structs the following program: 

{ def f(x,y) = e' f 

def g(t) = e' g 

def f O = e' h 
} 

where 

e' t = T£let]L Env \Njx,Njj]L S tore* 

4 = re\e g \ ± Env [{u/t]}[u -+ (^ p , e n,n)}$ 

e'f = T£ I e± Q }± E nv-i- Stored 

and e± is the body of procedure f , and e g is the body of procedure g, and e± is the body of 
procedure f o- 

Let us follow the transformation of the body of procedure f . First we need to compute a number 
of values: 

I {BS — DS inx} ] = e t 
\ t = e\\ result = 62] = BS 

n = ds 

p' = [{lo}/t,NjTBsult,Njx,N/j] 

*' = [l ^ (Tuple K,N}] 

A"' = 

where p' and a' are the environment and store of the body of expression e± and A~ is the set of 
labels of objects deallocated in e±. 

These values are computed by applying EvalBindings to the bindings of the letrec block, the 
current activation label, and the current function environment. This procedure finds the fixpoint 
of the resulting environment, store, and set of deallocated objects' labels. From these values, we 
compute the additional values necessary to test the safety of deallocating the value bound to each 
identifier at run- time: 

/ = 
R = 
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No labels are inherited by the body of procedure f from the surrounding context, and no labels are 
returned from f . 

Now we apply function VS to the bound identifiers of the letrec block. This procedure calls VX 
on variables t and result and the value to which that identifier is bound, along with /, R, and 
A"': 

VX({l })lt}IRA- = ([Dealloc(t)],{l }> 

This call returns a deallocation command for identifier t and a set containing object label Iq because 
it is safe to deallocate the value bound to t upon termination of the control region containing the 
block bindings. 

The call to VX on identifier result follows: 

VX{N_) I result ] IRA~ = ( [ J , > 

No deallocation command is returned in this case because result is not bound to any objects. 

The other two procedures in the program, g and fo, are unchanged because there are no objects 
safe to deallocate in those procedures. 

The transformed program is: 

{ def f(x,y) = 

{ t = '°MakeTuple(x,y) 
result = fc °g(t); 

Dealloc(t) 
in result }; 

def g(t) = 
Selecti(t) 

def f O = 
kl f (6,847); 
> 

as we expected. 



5.5 Summary 

In this chapter we developed algorithms for performing object lifetime analysis and used this lifetime 
information to verify or insert object deallocation commands. This analysis technique is based on 
an abstraction of the operational semantics of KID - . 

In the next few chapters, we extend the analysis framework to handle more data types and higher- 
order functions. We also improve the modeling of activation labels to yield more precise information 
about the sharing of objects. 



Chapter 6 

Improving the Abstract Object 
Labels 



In this section we look at a more informative abstraction of activation labels that yields better 
information about the identity and lifetime of objects allocated by programs. First, we introduce 
a new abstraction based on regular expressions that partitions standard activation labels into 
equivalence classes. Next, we present the changes to the abstract interpreter definition necessary 
to use these activation labels. Finally, we analyze an example using these activation labels. 



6.1 A Better Abstraction of Activation Labels 

In Chapter 4, we saw one way to abstract activation labels so that abstract interpretation was 
guaranteed to terminate. However, we lost a great deal of information about the identities of 
objects that is very useful in the analysis of programs. In this section, we examine more precise 
abstractions of activation labels that yield better results in the analysis of programs. 

In Chapter 3, we saw that activation labels were composed of a sequence of expression labels 
separated by '.', where each expression label was the label of a particular function application in 
the program. 

The abstraction of activation labels should preserve some information about the standard activation 
labels. In fact, we would like abstract activation labels to be exactly the same as standard activation 
labels except in recursive invocations of functions. Figure 6.1 shows an activation tree consisting 
solely of non-recursive procedure calls. It is safe to do so because the set of such labels is bounded 
by the size of the static call graph of a program. 

Figure 6.2 shows the static call-graph of three procedures, f , g, and h, where g is a recursive 
procedure, and the corresponding activation tree showing the structure of the activation labels of 
recursive calls to g. We would like to distinguish the activations of the initial application of g 
inside procedure f from the recursive applications of g inside procedure g. We can capture this 
by abstracting sets of standard activation labels as regular expressions. For example, the abstract 
activation label 1.2 + would represent the following set of activation labels: 

{1.2,1.2.2,1.2.2.2,...} 
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Figure 6.1: Nonrecursive activation tree 



Figure 6.2: Call graph of a recursive procedure 



Likewise, the abstract activation label 1.2*. 3 represents 

{1.3,1.2.3,1.2.2.3,...} 
which are the activation labels of the calls to procedure h. 

We can think of the program's call graph (which can be statically determined because KID - is 
a first order language) as a finite automaton that accepts some set of strings. These strings are 
the standard activation labels. Every function represents a state in the automaton, and every 
application primitive represents a labeled edge. Every state in the automaton is an accept state. 
Our improved abstract activation labels for a program are the minimal regular expressions accepted 
by the finite automaton derived from the program's call-graph. 

The improved AL domain, shown below, consists of regular expressions that match all possible 
concrete activation labels. Abstract activation labels consist of an activation label paired with an 
expression label using ".", the disjunction of two activation labels, or the zero or more repetitions 
of an activation label. 



aeAL = e\(AL.L)\(AL + AL)\(AL)* 



6.2. EXAMPLE ABSTRACTION OPERATORS FOR ACTIVATION LABELS 123 

As in the standard domain of activation labels, the abstract activation label domain is a flat 
domain — all abstract activation labels are above the bottom element, but each one is incomparable 
with all of the others. The abstraction function that maps a standard activation label into an 
abstract activation label chooses the regular expression that accepts that particular activation 
label. 

Abstract object labels will now consist of pairs of our new abstract activation labels and the static 
MakeTuple labels. Abstract object references are still sets of abstract object labels. 

We also extend the function environment domain to map function names to mappings from products 
of abstract values, stores, and activation labels to pairs of an abstract value and a store. 

$ G FEnv = F -> ((V n X Store x AL) -> (V X Store)) 

6.2 Example Abstraction Operators for Activation Labels 

The function that abstracts activation labels depends on program structure. Abstract activation 
labels form equivalence classes for different paths through the call graph. The domain of abstract 
activations corresponds to the minimal set of regular expressions that name all the paths that start 
at the root of the call graph and end at each node in the call graph. 

We can define an abstraction function for the program whose call graph is shown in Figure 6.1. 
This program has four procedures, p, q, r, s, and fo and five function application expressions with 
labels k\, &2, &3, &4 and k$. 

The function that we need in the abstract interpreter takes an abstract label and the expression 
label of an application expression, and returns a new abstract activation label. This function 
simulates a DFA where there is a state for each acyclic path to each function, and transitions are 
taken on the labels of application expressions. For example, the next activation label function MAC 
for the program in Figure 6.1 looks like: 



NM(eM) = 


-- h 


MMih,^) = 


= h.k 2 


NACik^h) = 


= h-fo 


MACik4.k2.k4) = 


= k1.k2.k4 


MACik4.k2.k5) = 


= k4.k2.k5 


MAC (k4.k3.k4) = 


= k4.k3.k4 


MAC (k4.k3.k5) = 


= k4.k3.k5 


MACia.k) = 


= T otherwise 



Essentially, procedure q was split into two states, depending on whether it had been reached by 
the application labeled k2 or the one labeled k%. 

The next activation label function for the program whose call graph is shown in Figure 6.2 is more 
interesting, because this program contains recursive calls. We cannot split nodes to distinguish 
different paths through recursive calls, because this would lead to an infinite number of nodes. The 
function MAC for this graph is: 

NMUM) = k x 
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£a{ k f(se 1 ,---,se n ) \pcja<$> = 

{ V! =S£ A \ se 1 }p; 

v n = S£ A \se n }p; 

a' = MAC(a,k); 

(v,a', A" } = ^[f][v 1 ,---,v n ,a,a r \; 
in (v,ct',A-,A v )} 

Figure 6.3: Evaluation of procedure calls with improved abstract activation labels 



A r AC(k 1 ,k 2 ) = 


k\.k2 


MM(k 1 .k*2,k2) = 


k\.k2 


MMih.klh) = 


k1.k2.k3 


MAC(a,k) = 


T otherwise 



In this example, ah activations of procedure g have activation label ki.k^. If there were more than 
one non-recursive path to invoke procedure g, then these would have distinct activation labels. 



6.3 Extensions to Abstract Interpreter 

This section describes the way to revise the abstract interpreter to compute improved activation 
labels. The expression evaluator now takes an abstract activation label in addition to the environ- 
ment, store, and function environment that it took before. 

The new evaluation rule for function applications is shown in Figure 6.3. Note that the function 
MAC is used to create a new abstract activation label given the current activation label a and the 
expression label k. We look in the function environment $ to find the value of the body of the 
function evaluated with activation label a', which is the abstraction of the current activation label 
concatenated with k. 

The revised abstract interpreter clause that evaluates the MakeTuple primitive is shown in Fig- 
ure 6.4. This clause constructs a new object label from the current abstract activation label a and 
the expression label /. Other than that it is the same as the original abstract interpreter clause for 
MakeTuple. All other clauses of the abstract interpreter remain the same, except that activation 
labels are passed to the expression evaluator. 



6.4 Evaluation of Examples Using Improved Activation Labels 

Figure 6.5 contains an example that we saw earlier where sharing is falsely detected by the abstract 
interpreter using completely static object labels. Let us reexamine this example using our improved 
abstraction of activation labels and object labels. 

Let us examine the input-output behavior of procedure g first. If g is applied to a number in context 
( p, a, a ), it returns a reference to an object in location a : /q and a store a' which is derived from 
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£a\ MakeTuple (sei, • • • , se m ) ] paa$> 
{ V! =S£ A { se 1 }p; 



v m = S8 A I se m ] p ; 

^ tuple — [Tuple ^1 1 ' ' ' i ^m) ? 

ol = a : /; 

V tuple = ^i ^ 

a' =a[ol^(v tuple Uv' tuple )]; 
in ( {o/},o-', $,±dmap) } 



Figure 6.4: Evaluation of tuple allocation with improved abstract activation labels 

{ def f(w) = 

{ tl = fc °g(w); 
w2 = w * 2; 
t2 = fcl g(w2); 
r = (w * w2) ; 
t3 = ' 3 MakeTuple(tl,t2,r); 
in t3 }; 
def g(x) = 

{ y = (x-21); 

t = '°MakeTuple(x,y) ; 
in t > 
def f O = 
fc2 f(68); 

>; 

Figure 6.5: Example with false sharing 
store a as follows: 



a 



a 



a:l ^(a[a:l ]U( Tuple K,K))] 



Now, let us study the internal behavior of function f when applied to a number and the empty 
store in activation e. We evaluate the bindings of the letrec block in the body of f to yield the 
environment p' and store a 1 : 



-i-En 



w 


-+ N 




tl - 


-+ {e.k 


M 


w2 - 


■+ N 




n - 


-* {e.h 


M 


r 


■+ N 





n 



{t : h} 
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a 



J-Sio 



e.k : l -* { Tuple N_,N_) 
e.k x : l -> {ru-ple K,K) 
e ■■ h^ (Tuple K,N,N) 



Using the more precise abstraction of activation labels, we can distinguish between the tuples to 
which tl and t2 are bound. We can tell that they must be different objects. Therefore, in the 
body of procedure fo, we can insert code to deallocate all three tuples allocated, rather than only 
two. This degree of precision can be very useful. 



6.5 Summary 



There are many ways in which we can abstract activation labels. In this chapter we discussed one 
way to abstract activation labels that improves the effectiveness of the analysis compared to the 
abstracted activation labels that we used in Chapter 4. We use this abstraction for the remainder 
of the thesis, and we use a variation of these abstract activation labels in our implementation of 
the lifetime analysis. 



Chapter 7 



Abstracting and Analyzing Arrays 



The abstract interpretation of arrays is different from that of tuples: the size of an array is computed 
at run-time, while the size of a tuple is fixed at compile-time. Section 7.1 discusses our approach 
to abstracting arrays — we summarize all elements of an array by one abstract value. 

This array abstraction leads to problems determining whether there is sharing among the elements 
of the arrays. Section 7.2 discusses an improved array abstraction that contains an annotation 
informing whether any elements in the array are shared. 

Sometimes it is difficult to define an array using MakeArray, even though the program fits nicely in 
the single-assignment paradigm. Id has I-structure arrays to extend the single-assignment paradigm 
beyond the functional subset. I-structures are non-functional, single- assignment arrays whose pres- 
ence greatly increase the expressiveness of the language, and only slightly increase the complexity 
of lifetime analysis. For example, writing a function that finds the inverse of a permutation takes 
0(n 2 ) space and time when written using MakeArray, but can easily be written in 0(n) time and 
space using I-structures. Section 7.3 discusses the addition of I-structures to our instrumented and 
abstract interpreters and their impact on the deallocation safety condition. 



7.1 Abstract Interpretation of Arrays 

Arrays are aggregate objects whose size is not determined until run-time. In the interpreter, 
the objects must be represented by structures with a fixed number of components because we 
require abstract interpretation to take a finite amount of time. We summarize the value of an 
array of arbitrary size as an abstract array with a single element. Our array abstraction has a 
single component that represents all of the elements of the concrete array. This single abstract 
element is the least upper bound of the abstraction of all of the concrete array elements. We call 
this summarization spatial summarization. Spatial summarization combines information about an 
uncertain reference or spatial path, not about an uncertain control path. 

For example, consider the following concrete array of tuples: 

[Array 3, l\, I2, h) 

where 3 denotes the length of the array and /i, li and 1% are the labels of concrete tuples. The 
abstraction of this array would be an abstract array with one element summarizing all of the 
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S10S 'Array {[Array ^? ^1? ' ' ' i ^n)) — \ Array | | ^i J 

Figure 7.1: Array abstraction operator 

[Array v 0/ U Array [Array v l) = [Array "oUyfl) 

Figure 7.2: Array least upper bound operator 

(Array ^o) E Array (Array V 1 ) = "0 Ef "l 

Figure 7.3: Array ordering operator 
elements of the standard array: 

Array VI, hi h} 



The element {hihih} indicates that the components of the standard array could be any one of 
the abstract tuples named by h, h or ^3- 

If we had subscript range information, we might be able to abstract the elements of an array into 
a small number of elements that represent the values that could be present in subregions of the 
array under the standard interpretation. In that case, an array would be represented as a set of 
intervals and the abstract values that summarize the components of the standard array contained 
in those intervals. The use of range information during abstract interpretation is an area for further 
research. 

7.1.1 The Abstract Array Domain 

We add the following definition of arrays to our abstract domains, and revise the definition of the 
abstract store and storable value domains. 

Varray G Array = ( Array V) Arrays 

sv G SV = Tuple + Array Storable Values 

a G Store = L —* SV Stores 

Figure 7.1 contains the function Abs^rrayi which maps standard array values into abstract array 
values. Figure 7.2 contains the least upper bound operator on abstract arrays, and Figure 7.3 
contains the ordering operator for abstract arrays. 

These domains, along with the added ordering and abstraction operators, allow us to revise the 
abstract interpreter to model arrays. 

7.1.2 Abstracting the Array Primitives 

The following two clauses, 7.1 and 7.2, give the abstracted evaluation rules for the array primitives. 
As in the standard interpreter, the MakeArray primitive is subscripted with the name of a function 
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fi and takes a length value n and r values to be passed to the calls to /,-. The length n is ignored, 
because abstract arrays all contain a single component. The abstract interpretation of this primitive 
uses the static label / of the primitive directly as the object label of the abstract array created. In 
this way, it resembles the abstract interpretation of the MakeTuple primitive. 

First, we compute the value of the application of function fi to the input value consisting of 
the values JV, the r input values, and the current store. As with the interpretation of function 
applications, we look up the returns value of the function application in the function environment 
and add the input value to the interesting domain of function fi in the new domain map delta A v . 
We use the result value of the function application as the representative element value of the array. 

£a\ MakeArrayj. (seo, se\, ■ ■ ■ , se r ) ] paa$> = (7-1) 

{ v 1 =S£ A \ se 1 }p; 



v r 




= S£ A { se r ]p; 


a' 




= NM(a,l); 


( u, a', 


A" 


■ ) = $[fi]\JL,v 1 ,---,v r ,<T,a r ]; 


^array 




— \ Array ^/ i 


ol 




= a : /; 


v' 

array 




= a'[ol]; 


a" 




— O [Ol > (V arra y U V arra y)\, 


^ v m 




= {(K,n,---,v r ,(T,a')}; 



in({o/},cr",A-,A^)} 

The abstract interpretation of the Fetch primitive is very similar to the abstraction of the Select; 
primitive. Fetch takes two values: a set Is of labels and an index. Fetch takes the least upper 
bound of the arrays to which each of the labels in Is is bound in store a, and then returns the 
element value of that array. 

£ A { fc Fetch(sei,se 2 ) } paa$ = (7.2) 

{ Is =S£aI se 1 }p; 

(Array v) = \J a[ol]; 

ol^ls 

in ( v,a, 9,-Ldmap) } 

The abstraction of the Bounds primitive is very simple. It ignores its argument and returns an 
abstract integer as its result. 

£ A { fc Bounds(se) \paa§ = 
{in (N,<T,<D,L DM AP) } 

Now that we have an abstraction of array values and have augmented the abstract interpreter with 
clauses for the array primitives, let us examine some array program examples and see how our 
lifetime analysis algorithm performs. 

7.1.3 Example Array Programs 

The first example we look at is shown below. It consists of a function f 1 takes three numbers and 
constructs an array containing a different tuple in each element. The function gl is the function 
that defines each of the array elements. 
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def gl (i, x, y) = 
'°MakeTuple(x,y,i); 

def fl (n, x, y) = 

^MakeArraypiCn, x, y) ; 

Our abstract interpretation of this example computes the following representation for the value of 
p and a within /i's body: 



p[a] = {h} 

a [h] = (Array {^o}) 
<T[lo] = (Tuple N,N,N) 



That is, variable a is bound to an array labeled l\ which contains a a three-tuple of numbers labeled 
Iq as its element. 

We can determine that the lifetime of the array labeled l\ is bounded by the lifetime of f 1, because 
l\ is not reachable from the labels inherited by f 1 (the empty set) and because l\ is not returned 
as part of the result of f 1 — a single number is returned. The same is true of the tuple labeled 
Iq — its lifetime is bounded by that of procedure f 1. 
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Now consider the following example, which is similar to the previous one, except that the tuple 
labeled Iq is allocated by procedure f 2 and passed to the procedure g2 that computes the elements 
of the array l\. Thus, f 2 allocates an array where a tuple is shared by each of the elements. 



def f2 (n, x, y) = 

{ t = '°MakeTuple(x,y,4); 
a = ' 1 MakeArray 32 (n, t) 
in 3 >; 



def g2 (i, t) = t; 

The representation computed for the value of a is the same as in the first example: 

a = {h} 

a [h] = (Array {^o}) 
<T[lo] = (Tuple N,N,N) 

The variable a is bound to an array labeled l\ containing a tuple or tuples labeled /q. In this 
example, we can determine that the lifetimes of array l\ and tuple Iq are bounded by the lifetime 
of procedure f 2. 

In both of these examples, we can verify that it is safe to deallocate the array bound to a when 
either f 1 or f2 terminate, because label l\ is allocated within the body of f 1 and f 2, l\ does not 
escape, and l\ cannot be deallocated elsewhere. 

There is one fact we have not been able to uncover using our lifetime analysis, and that is that in 
the first example, each element of the concrete array is distinct, and that in the second example, 
each element of the concrete array is the same. If there is no sharing, then the compiler may insert 
code to deallocate each element of the array. If there is sharing, then deallocation of the elements 
becomes a little more difficult because we cannot deallocate any element more than once. We can 
work around this problem with run-time support. The run-time code that deallocates the elements 
of an array must keep track of the objects it has deallocated to ensure that it deallocates each 
unique element of the array exactly once. 

The first example actually has no sharing. But because the abstraction does not yield sharing 
information, the compiler must generate code that carefully deallocates each distinct element of 
the array a for both f 1 and f 2. This strategy of code generation is safe, but is less efficient than if 
we could determine that there was no sharing of elements in procedure f 1. 

7.2 Sharing Analysis in Arrays 

In the previous section, we defined an abstraction of arrays and showed how to perform lifetime 
analysis on programs containing arrays. We also saw that the analysis does not capture an impor- 
tant fact about the arrays, namely, whether the elements of the array are shared are not. 

This section investigates a change to the representation of abstract arrays in the abstract interpreter 
so that the analysis yields sharing information. The approach we take enables us to determine 
whether two elements of an array are completely distinct or whether they may be shared at some 
level. 
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{Array S ,V ) \J Array (Array Sl,^l) = (Array (^O Us Si),(v \Jv Cl)) 

UnShared Lis UnShared = UnShared 

UnShared Us Shared = Shared 

Shared Us UnShared = Shared 

Shared Us Shared = Shared 

Figure 7.4: Improved array least upper bound operators 



7.2.1 Modeling Sharing in the Abstract Array Domain 

In order to track the sharing of array elements, we add an annotation to each abstract array that 
indicates whether the array components may be shared or not. This sharing annotation is drawn 
from domain S , which consists of two values: Shared and Unshared, where Unshared C Shared. 



ses 


= Shared + Unshared 


Sharing Predicate 


v array £ Array - 


= [Array <-> > * / 


Arrays 


sv e sv 


= Tuple + Array 


Storable Values 


a G Store - 


= L^ SV 


Stores 



If we have an array of nested structures, we take Unshared to mean that the structures stored 
in each element of the array are completely unaliased from the structures stored in every other 
element of the array. 

Figure 7.4 contains the least upper bound operator on abstract arrays, Figure 7.5 contains the 
ordering operator for abstract arrays and Figure 7.6 contains the abstraction operator for arrays 
with sharing. 

7.2.2 Abstracting the Array Primitives with Sharing 

The clauses of the interpreter must be augmented to compute the proper sharing information. 
The only change is to MakeArray, which generates either a shared array or an unshared array. A 
call to MakeArrayj 8 generates an unshared array only if all of the labels reachable from the the 
application of procedure /,- are disjoint from the locations reachable from the arguments to the call 
to MakeArray. Since none of the inherited labels are reachable from the element value resulting 
from the application of /,-, all of the labels reachable from the element value must be allocated 
within the application, and none may be shared among elements of the array. 

£a\ MakeArray/. (seo, se\, ■ ■ ■ , se r ) ] paa$> = 
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{Array S ,V ) Q Array (Array Sl,^l) = (^0 Es «l) A (v Qv Vi] 

UnShared C5 UnShared = True 

UnShared C5 Shared = True 

Shared C5 UnShared = False 

Shared C5 Shared = True 



Figure 7.5: Improved array ordering operators 



Ab-SArray ((Array n, V X , ■ ■ • , W n )) = { S = if I /\ V % ^ V 3 

£/ien Unshared 
else Shared] 

v= U w 8 ; 

l<i<ra 

Figure 7.6: Abstraction operator for arrays with sharing 



{ v 1 =SS A { se 1 }p; 

v r = S£ A I se r ] p ; 

a 1 =NM(a,l); 

(u, a', A" ) = <5>[f % ][N_ 1 vi,---,v r ,o,a']; 

I = \J v , Reachable (vi, <r); 

i? = Reachable (u)a'; 

Varray = if R C\ I = $ 

then (Array Unshared, u) 
else (Array Shared, u) ; 
ol = a : k; 

V 'array =^'[o/]| 

a" = a'[ol -► (iw^ U v' array )]; 

A v [f t ] = {(N,v 1 ,---,v r ,a,a')}; 

in ({ol}, a", A~,A V )} 

The only change to the Fetch evaluation clause is to make it fetch components from abstract arrays 
annotated with sharing information. 

Sa\ fc Fetch(sei,se 2 ) \paa§ = 
{ Is =S£ A \ se 1 }p; 

(Array S,v)= \J a[ol]; 

ol^ls 

in ( v,a, ®,-L DM ap) } 
No change is needed in the clause for the Bounds primitive. 
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7.2.3 Reexamining the Array Examples 

The first example we considered in Section 7.1.3 had no sharing in it. The example is shown below 
for reference. 

def gl (i, x, y) = 
'°MakeTuple(x,y,i); 

def fl (n, x, y) = 

{ a = ^MakeArraypiCn, x, y) ; 
in 3 >; 

Our new abstract interpretation should discover that there can be no sharing in this example. The 
arguments to the call to MakeArray are: 



n 


= K 


v x 


= K 


Vy 


= K 



assuming that variables x and y are bound to integers. Thus, there are no locations reachable from 
the arguments to the call to MakeArray. 

7 = 

The result of the call to gl is a reference to a tuple in location /q, so the set of reachable locations 
is {/o}: 

R = {M 

and the intersection of the inherited locations and the reachable locations is the empty set. There- 
fore, there can be no sharing between elements of the array. There is no way, without side effects, 
that different elements of the array could end up sharing the same location. 

We end up computing the following representation for the value of a: 

p[a] = {h} 

&[h] = {Array Unshared, {l }) 
<r[lo] = (Tuple N,N,N) 



That is, a is bound to an array labeled l\, containing a tuple or tuples labeled Iq as its elements, 
and the tuples in different elements of the array are guaranteed to be distinct. 

If we determine that the above array and its components are dead in some context, then we can 
insert code that deallocates the array and all of its components without having to insert any run- 
time code to detect sharing among the components. 

The second example from Section 7.1.3, which created an array containing shared elements, follows: 

def f2 (n, x, y) = 

{ t = '°MakeTuple(x,y,4); 
a = ' 1 MakeArray 32 (n, t) ; 
in 3 >; 

def g2 (i, t) = t; 
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In this case, the arguments to the MakeArray are: 

p[n] = JV_ 
p[ v t] = {lo} 

Therefore, the inherited locations are the set {lo}: 

I = {lo} 

The application of procedure gl yields a reference to location Iq, so the set of reachable locations 
R\s: 

R = {lo} 

Since the intersection of / and R is non-empty, the array representation is shared. 

a = {h} 
&[h] = {Array Shared, {Iq}) 
<r[lo] = (tu P u K,N,N) 



The variable a is bound to an array labeled l\ containing a tuple or tuples labeled Iq, where there 
is some sharing among the elements of the array. 

In this example, we can still insert code to deallocate the array and its components if we determine 
that they are dead in some context, but we have to insert run-time code to detect sharing. 



7.3 Modeling I-Structures 

This section extends the instrumented and abstracted interpreters with I-structure array data types 
primitives. Although Id has both I-structure algebraic types and arrays, we discuss only I-structure 
arrays — the implementation of other I-structure types follows directly from the model of I-structure 
arrays. We use the array value domain, but add two new array operators: MakelArray and Store 
to KID - . The first subsection presents the standard semantics of I-structure arrays, and the second 
subsection presents the abstract semantics extended with I-structure array operators. 

In KID - , I-structures are created using the primitive MakelArray with all slots empty, or bound 
to _L. Elements of the array may be filled in using the primitive Store and dereferenced using the 
primitive Fetch. It is an error to store more than one value into a a single I-structure array slot, 
although we do not check this in the interpreter. 

7.3.1 I-Structures in the Instrumented Interpreter 

This section defines the interpreter clauses for the primitives MakelArray and Store. The primitive 
MakelArray takes one value: a simple expression that evaluates to length n. It returns a newly 
allocated array of length n where each component is initially unbound. 

Si { fc MakeIArray(se) Jpaa = 
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{ ol = a : k; 
n = S£ I se ] p ; 

v = -L; 

Vn-x = _L; 

a' =a[ol ^ ( Array n,v ,- ■ ^Vn^)]; 

in {ol,a',{{a, o/)},0,0)} 

The primitive Store takes three values: a reference ol to an I-structure array a, an index i, and a 
value f , and returns the reference to the array and a store that has the ith component of a bound to 
v. The interpreter records the fact that a side-effect was done on the object labeled ol by returning 
a reference event for ol in activation a. 

£l\ Store (sei, se 2 , se%) ] paa = 
{ol = S£\se 1 }p; 
i = S£ I se 2 ] p ; 
v[ = S£ I se 3 ] p ; 

(^raj ra, w , • • • , w 8 , • • • , w n _i) = a[ol]; 
a' = a[ol -► (^^y n, w , • • • , w 8 U v[, ■ ■ ■ , v n _ x )\; 
in (o/,d',0,0,{(a, o/)}) } 

Now that we have seen the definition of the instrumented interpreter clauses for handling I- 
structures, we can go on the definition of the abstracted I-structure domains and the abstracted 
I-structure interpreter clauses. 

7.3.2 I-Structures in the Abstract Interpreter 

In the abstracted interpreter, we use the array domains with sharing information. Whenever we 
store into an I-structure array, we make that array be a shared array. 

The following two clauses give the abstracted evaluation rules for I-structure array data structure 
primitives. The primitive MakelArray constructs an abstract array with no sharing whose compo- 
nents are undefined. The primitive Store updates the component of the array to the least upper 
bound of the current array element and the new value. Storing into an I-structure array may 
potentially introduce sharing, so we upgrade the sharing indicator of the array to Shared when a 
Store is performed. 

£a\ fc MakeIArray(sei) ] paa$ = 

\ v array — \Array U HSliarea , _l_y , 

ol = a : /; 

Krray = ^i ^ 

a' = a[ol ^ (v array U v' array )]; 
in ( {ol},a', ®,-L DM ap) } 
£a\ Store (sei, se 2 , se%) ] paa$> = 

{ Is =S£ A l se x l/o; 
v =S£ A l se 3 }p; 
' J a [°l] ^ (Array Shared, v) if ol G Is 

' 1 viol] otherwise 

in ( ls,a',$,± DM Ap) } 
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7.3.3 Effect Of I-Structures on Deallocation Safety Conditions 

The introduction of I-structures into KID - has introduced a new mechanism by which objects can 
escape from a given activation. Previously, objects could be passed into an activation through the 
environment as inherited values or arguments, or they could be passed out of the activation as part 
of the results. Now, an I-structure with empty slots can be passed into an activation, and objects 
allocated within the activation can be stored into the empty slots and escape from the activation. 
Thus it is now possible for objects allocated within an activation to escape via the inherited objects. 

However, this new path for escaping objects does not significantly change the criteria that we use 
to decide that a particular activation contains an object's lifetime. We now must determine that 
an object is not reachable from the result of an expression or from the objects inherited from the 
surrounding environment after the expression has executed. The only change in the tests is which 
store is used to determine reachability from inherited objects. Previously, we used the incoming 
store to determine reachability; now we must use outgoing store to determine reachability. 

Let us again consider the canonical letrec block with deallocation commands that we would like 
to verify: 

e = { x x =ei; 



Dealloc (yi) 

Dealloc(y m ) 
in Xj } 

where the environment, store, and function environment in which e is to be evaluated are p, a, and 
$, respectively. 

We compute environment p' and store a', the resulting environment and store for the block bindings, 
A - , the set of labels deallocated by the block bindings, and v, which is the result of the evaluation 
of the expression, as shown below. In addition, we compute R, the set of object labels reachable 
from the result of the expression, and /, the set of object labels reachable from the free variables 
of the expression. 





po ~- 


= p[L/x ir --,L/x n ] 


p',a',A- 


-',A»> = 


= EvalBindingsA ([[ x\ = e\\ . 




v - 


= p'[ x j\ 




R -- 


= Reachable (v , a') 




I = 


- M Reachable (p'[y], a' 

yeFV(e) 



e n ],$,po,(J, -Ldmap) 



Previously, / was computed with respect to a, the incoming store, and now it is computed with 
respect to a', the result store. 

7.3.4 Example I-Structure Program 

We will now execute an example I-structure program to see how lifetime analysis performs in the 
presence of side-effects. In the following I-structure program, procedure fo allocates an empty 
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I-structure and passes it to procedure g, which fills it in with two tuples. 

def g (a) = 

k2 / tl = ' 1 MakeTuple(6.823, 6.847); 
t2 = ' 2 MakeTuple(6.847, 6.823); 
xl = Store(a, 0, tl) : 
x2 = Store(a, 1, tl) : 
x3 = Store(a, 2, t2) : 
in True / ; 



def f = 

k °i a = /o MakeIArray(3); 



v 



fci 



g(a); 



r = Fetch(a, 0) ; 
in r >; 

The tuples allocated in procedure g and bound to variables tl and t2 are not returned as part of 
g's result, yet they escape from the body of g. They are stored into the I-structure passed in as g's 
argument. 

The result yielded by executing this program under the instrumented interpreter would be: 

( e.k Q .k x .k 2 : h, 

e.k : l -► (Array 3, e.&o.&i .k 2 : l\,e.kQ.k x .k 2 : l\,e.kQ.k x .k 2 : l 2 ) , 
e.k Q .k x .k 2 : l\ -> {Tuple 6.847 . 6.823 ) , 
e.k Q .k x .k 2 : l 2 -► {Tuple 6.847 . 6.823 ) , 
{( e.k : / , e.k ) , ( e.k Q .k x .k 2 : h, e.k Q .kx.k 2 ) , ( e.k Q .kx.k 2 : / 2 , e.k Q .kx.k 2 )}, 

0, 

{( e.k : / , e.k ) , ( e.k : / , e.k Q .kx.k 2 )} ) 

The abstract interpreter, using our improved activation labels, would yield the following: 

( {e.k Q .kx.k 2 : lx^.kQ.k x .k 2 : / 2 }, 

e.k : l -^ {Array Shared , {e.k .ki.k 2 : li,e.k .ki.k 2 : l 2 }) , 

e.k .k 1 .k 2 : h -> { T u V le K,K) , 

e.k .k 1 .k 2 : l 2 -? { T u V le K,K) , 
0, 
-Ldmap ) 

We lose some information in the abstract domain, because it appears that both tuples escape from 
the result of procedure f q. This approximation is safe, because nothing that is reachable under 
the instrumented semantics appears unreachable under the abstracted semantics. Lifetime analysis 
using the abstract interpreter correctly determines that the two tuples may escape from the body 
of procedure g, even though neither of them is directly returned as part of g's result. 



7.4 Summary 



This chapter described the abstraction of the array domains and how we have to use spatial sum- 
marization to obtain a finite representation of arrays at compile-time. We used this abstract array 
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domain to extend our lifetime analysis algorithm to handle programs containing arrays. We then 
extended the abstracted array domains and abstract interpreter in order to perform sharing analysis 
on array elements, because the compiler can generate more efficient deallocation code if it knows 
that no element of an array is shared. 

We also added I-structure arrays to KID - in this chapter. I-structures increase the expressiveness 
of the language and allow us to write some programs more efficiently than if we had to use the 
functional MakeArray construct. I-structures also introduce a new path for objects to escape from 
a control region — objects may escape by being stored into I-structures that were inherited from 
the surrounding context. We showed that our lifetime analysis algorithm correctly handles this 
case. 
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Chapter 8 



Algebraic and Recursive Types 



In Chapters 4 and 5 we saw how to summarize the behavior of KID - over tuples, numbers, and 
booleans. In Chapter 7 we introduced the notion of spatial summarization which was necessary 
to generalize the values of arrays. Spatial summarization was introduced because, in general, the 
size of an array can only be determined at run-time, and we needed to be able to summarize the 
behavior of a program over all possible arrays. 

In this chapter, we develop an abstraction of algebraic types. The abstraction of non-recursive 
algebraic types is very straightforward. This abstraction is discussed in the first section of this 
chapter. 

The abstraction of recursive algebraic types in many abstract interpreters is very difficult because 
the size of the representation of a recursively typed object can grow without bound. We see in 
the second section of this chapter; however, that our abstract interpreter does not suffer from this 
problem. 

Our abstraction of non-recursive algebraic types is adequate for recursive algebraic types as well. 
This abstraction involves a form of spatial summarization because the number of nodes composing 
an object of recursive type can only be known at run-time, but our abstraction compresses it into 
a finite number of nodes at compile-time. 

Although our abstraction of algebraic types is general enough to model any recursive algebraic 
type safely, the only recursive type for which our implementation of the deallocation code insertion 
algorithm can generate deallocation code is lists. We discuss our abstraction of lists in the third 
section of this chapter and compare our abstraction with that of other researchers. 

The spatial summarization that occurs in the abstraction of recursive types makes it difficult to 
insert code to deallocate these objects because there may be sharing between the nodes of the 
objects. We need a better idea about the sharing that occurs between the elements of a recursively 
typed object. We discuss a way to approach this problem in Section 11.1.3. 



8.1 Abstraction of Algebraic Types 

In Chapter 2 we saw that oneofs, or algebraic types, are represented by tagged structures in the 
standard interpreter. The tags distinguish instances of the different disjuncts of the algebraic type. 
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Evaluation of a given expression in different contexts may return different disjuncts of an algebraic 
type. For this reason, an abstract oneof value must be a product of each of the possible disjuncts, 
rather than a sum, as in the instrumented interpreter. The abstracted value must capture informa- 
tion about the values resulting from evaluation in all possible contexts. For instance, an expression 
that results in an object of type transaction, where: 

type transaction = deposit I | withdrawal I 
might return either a reference to a deposit of 19.92, represented by: 

(0,2 !M2) 

or a withdrawal of 353.0, represented by: 

G, 2 MM)- 

The abstract interpreter must represent both possibilities in a single value. This expression would 
return a reference to the following abstract oneof: 

(Oneof (om^tN)) 

which represents either a oneof with tag (a deposit) or a oneof with tag 1 (a withdrawal). 

The above abstract transaction value is the most defined abstract transaction value. This abstract 
value represents standard values that are either deposits or withdrawals. We can also represent 
transactions that could only be deposits as follows: 

(Oneof (o N) , _L) 

We represent transactions that can only be withdrawals as follows: 

(Oneof J-, (l N}} 

Either or both of the components of an abstract transaction structure can be bottom. If an 
expression e evaluated in some context C under the abstract interpreter yields a transaction 
structure with bottom for the deposit component, then the same expression could never yield a 
deposit structure if it was evaluated under the standard interpreter in a context compatible to C. 

The following are the abstraction functions that map standard transactions into abstract transac- 
tions: 

ABSTransaction((o,2 n )) = (Oneof (ojV),±) 
ABSTransaction((l,2 n )) = (Oneof -L,(i JV)) 

This method of summarizing information about algebraic types is very general. As we shall see in 
Section 8.2, it even handles recursive types appropriately. 
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Absoneof ((tag,n V , • • • , V m )) = 

(Oneof -L,---,(tag Ab.S V (v ), ■ ■ ■ ABS V (v m )} , • • • , J.) 

Figure 8.1: Oneof abstraction operator 

(Oneof d , ■ ■ ■ , d n ) \J Oneof \Oneof d Q , ■ ■ ■ , d n ) = 
\Oneof (do ^Disjunct do), (d n ^Disjunct d n )) 
\i ^0? ' i ^m) ^Disjunct \i ^0? ' i ^m) — 

(i (v U v u ),---,(v m U v u m )) 
Figure 8.2: Oneof least upper bound operators 

\Oneof do, • • • , (l n ) ^Oneof \Oneof d Q , • • • , U n ) = /\ Cli ^Disjunct "•{ 

i 

(i v 0, ■ ■ ■ , v m) ^Disjunct (i^,'") u m) = f\ V<3 Q V U 

i 

Figure 8.3: Oneof ordering operators 



8.1.1 Domains for Abstract Algebraic Types 

We add the following definitions of the abstract Disjunct and Oneof domains and revise the storable 
value domain SV as shown: 

Disjunct = (n V,- ■ ■ ,V)j_ 

Oneof = {oneof Disjunct,- ■■, Disjunct) 
SV = (Tuple + Array + Oneof) j_ 

Each value in the Disjunct domain is either a tagged tuple of denotable values or bottom. Each 
value in the Oneof domain is a tuple of Disjuncfs, and storable values (SV) are either tuples, 
arrays, oneofs, or bottom. Stores still map abstract object labels to storable values. 

Figure 8.1 contains the function Absoneof , which maps standard oneof values into abstract oneof 
values. Figure 8.2 contains the least upper bound operator on abstract oneofs, and Figure 8.3 
contains the ordering operator for abstract oneofs. 

8.1.2 Abstract Interpretation of Algebraic Types 

Figure 8.4 contains the clauses of the abstract interpreter that evaluate the primitives that cre- 
ate and manipulate oneof objects. These clauses are similar to the ones from the instrumented 
interpreter, except that they manipulate abstract oneof values. 
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£ A I 'MakeOneof t ag ,n tags (se 1 ,- ■ ■ , se m ) ] paa$ = 
{ V! =S£ A \ se 1 }p; 

• 1 

v ni = S£ A I se n% ] p ; 
ol = a : /; 
d = _L; 

dtag = (tag V 1 , - - - , V m ) ; 

""fltags = -L) 

^oneof — \ Oneof ^0? ' ? ^ntags — 1 / > 

ct' = ct[o/ -► (tWo/ U cr[o/])]; 
in ( {ol},a', ®,-L D Ma P } } 
£ A \ Is ta gl(se) ]paa$ = 

{k,a,®,± DM a P ) 
£ A \ Select faflii (se) ]paa$ = 
{Is = S£ A \ sejp; 

\ Oneof @>0i ' i @>tag — li @>tagi @>tag-\-li ' ' ' i ^ntag^ — ^- / — 

U *M; 

ol£ls 
(tag Vi, ■ ■ ■ , V m ) = d tag ', 
in ( Vi,CT,0,±£)Map) } 

Figure 8.4: Abstract interpretation of algebraic type primitives 

8.2 Abstraction of Recursive Types 

Recursive types are a special case of algebraic types. Even though the individual nodes of a 
recursively typed object are of fixed size, the object can have a size that is unbounded. In the 
abstract interpreter, we need some form of spatial summarization that collapses a list or tree object 
of potentially unbounded size into a representation with bounded size. As we see later in this 
section, the spatial summarization of recursive types naturally follows from our abstraction of 
locations and stores. 

Consider the definition of copy_list, shown below. This procedure takes a list and returns a copy 
of the list. 
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def copy_list(l) = 
if Nil?(l) 
then '"Nil 
else { a = Hd(l); 
as = Tl(l); 
1' = fc °copy_list(as) ; 
bs = ^MakeConsCa, 1'); 
in bs }; 

The result of a call to copy_list under the standard interpreter is a list whose length is the same 
as the length of the input list 1. 

Abstract interpreters that do not use a store or retrieval function to model structures have difficulty 
abstracting recursive types. Under these interpreters, the result from a call to copy_list would be 
a potentially infinite representation of a list because all procedure calls and both branches of all 
conditionals are evaluated. Consequently, the abstract interpretation would not terminate unless 
some action was taken to bound the size of the representation of the list. 

There are three ways we can bound the sizes of the representations of recursive types in abstract 
interpretation. First, we can compress the domain a priori, as we did with the integer and boolean 
domains. Second, we can apply a generalization, or summarization, operator to such representa- 
tions. Third, we can structure our domains and interpreter so that we can guarantee that no values 
of unbounded size are ever constructed. 

8.2.1 Abstraction of Recursive Types by Domain Compression 

Much of the functional language community has taken the first approach to abstracting recursive 
types. For instance, Wadler compresses the abstract list domain into the following four elements 
for strictness analysis [42]: 

Te — any finite list, no member of which is _L 

_l_e — any finite list, some member of which is _L 

oo — any infinite list or approximation to one, except _L 

_L — _L 

This list domain ensures that the abstract interpretation terminates in a finite amount of time, 
because all list objects have fixed size. 

The list domain defined by Wadler can only capture information about uniform lists. It cannot 
capture information about lists that may begin with a finite sequence of cons cells with non-uniform 
properties followed by a uniform list. Furthermore, it is difficult to see how to define appropriate 
abstract domains for other algebraic types based on this abstraction of lists. 

8.2.2 Abstraction of Recursive Types by Ad Hoc Object Compression 

The second approach to limiting the size of the abstract representation of a recursively typed object 
is to apply a compression operator to the representation: the operator generalizes the abstract value 
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Figure 8.5: Abstract list value before and after compression 



representation and limits the number of nodes in the representation to some arbitrary bound k. 
Figure 8.5 shows a representation of a list (a) before, and (b) after compression. The compressed 
abstract list contains only two nodes. 

The drawback to this approach is that it is difficult to choose the bound k on the object represen- 
tation's size that yields the best information for a particular program. In some cases the value of k 
may be too large, resulting in extra overhead during analysis but not providing more information. 
In other cases, the value of k is too small and useful information is obscured. 

8.2.3 Abstraction of Recursive Types by Object Label Compression 

The third approach, and the approach taken in this thesis, is to structure the domains and in- 
terpreter in order to guarantee the finiteness of the representation of any list or algebraic type. 
The use of stores (or other retrieval functions) that map a finite number of abstract object labels 
to abstract storable values guarantees that the size of a recursively typed object representation 
remains finite. All nodes with identical labels are coallesced into a single node. Thus, there will 
only be a finite number of distinct nodes in the representation of any object or group of objects. 

We defined our abstract activation labels, object labels, denotable values, storable values and stores 
so that we could analyze programs containing only tuples. We then augmented the storable value 
domains so that we could analyze arrays and non-recursive algebraic types. With this abstraction 
we can also analyze programs that use recursive types because the number of distinct abstract 
objects in a program is bounded by the size of the object label domain. The abstract object label 
domain is bounded in size by the number of paths through the call-graph of a program (disregarding 
cycles). 

8.2.4 Spatial Summarization in Recursively Typed Objects 

The abstraction of recursively typed objects involves a form of spatial summarization, as did the 
abstraction of arrays. In arrays, we summarized a single object whose size was known only at run 



8.3. ABSTRACTION OF LISTS IN KID~ 



147 



time by an abstract object by a single component. In recursive types, we summarize a concrete 
graph or tree containing an unknown number of nodes by a graph with a fixed number of abstract 
nodes. Any two nodes in the concrete graph whose object labels map to the same abstract object 
label will be summarized by a single abstract object. 

For example, consider the type tree: 

type tree = node tree tree | leaf N; 

and the following value and store that represent a concrete tree object: 



a : Z , 
a : Iq —> 
a.k\ : Iq 
a.k 2 : Iq 
tt.k2.k2 
a.k\.k\ 
a.ki.k2 
tt.k2.k1 
tt.k2-k2.k1 : l\ 
01.k2.k2.k2 : h 



{0,2 e-h : l ,e.k 2 : /o) , 

-► (0,2 e-h-h : h,e.k 1 .k 2 : h) , 

-► {0,2 e.k 2 .h : h,e.k 2 .k 2 : / ) , 

h -► {0,2 e-k2.k2.kx : h,e.k2.k 2 .k2 : l ) , 

h^(l,2l), 

h —> (1,2 2) , 

<i,g4>, 



This tree consists of 4 interior nodes and 5 leaf nodes. 

If we abstract the activation labels appearing in this representation to the following set: 

{a.(k + h)*}, 

then the abstract tree representation collapses to the following 2-node abstract value and store: 

{a.(k + k 1 )* : / }, 
a.(k + &i)* : Iq -► 

/ I \ a.(k + h)* : l , 

\ 0neof \° \ tt.(k + k 1 y:l 1 
tt.iko + h)* :h -+ (oneof J-,(i K)) 



a.iko + h)* : / , 
a.(k + fci)* : /1 




This representation consists of only two abstract nodes. The abstraction of activation labels is 
normally derived from the call-graph of a program. 

We do not add anything to the abstract domains or the abstract interpreter in this section. The 
complexity we added in the basic framework has paid off by being general enough to handle recur- 
sively typed objects. In the following section, we describe an extension to the abstract interpreter 
to model lists as a special case of Oneofs. 

8.3 Abstraction of Lists in KID - 



The list type, which is a particular recursive algebraic type, could be modeled using our Oneof 
domain. However, our implementation of the deallocation command insertion algorithm generates 
specialized code to deallocate lists, so we model lists separately in our abstract interpreter. 
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Abs Lisi ({cons «0,«1>) = {List { Cons Abs V (v ), ABS V (>i)),_L) 
AbsList {{mi )) = {List -L, {mi)) 

Figure 8.6: List abstraction operator 

{List Co, n ) Unst {List ci,rai) = { Lisi (c U Cras c x ), (n U Nl i ra x )) 

( Cons «0,«1> LI Cons ( Cons ™0, ™l) = ( Cons («0 LI y W ), Ol LI V V^)) 

{mi ) Llffi/ {mi ) = {mi ) 
Figure 8.7: List least upper bound operators 

{List co, n ) QList {List ci,rai) = (c Qc'ons ci) A (n Qmi «i) 

(Cons ^0,^l) EcWs (Cons ™0,Wl) = (j) LyTO )A(DlLyffii) 

(m; ) Em; (m; ) = True 
Figure 8.8: List ordering operators 

8.3.1 Abstract List Domains 

The definition of the abstracted list domain that we use follows: 

v ii S t e List = {List {cons V, Ls) L , {mi ) j_) Lists 

sv G SV = (Tuple + Array + Oneof + List)±_ Storable Values 

Abstract lists, like abstract oneofs, are represented by a pair of tagged disjuncts. If one of these 
components is bottom, that indicates that none of the concrete values represented by this abstract 
list could evaluate to Cons or Nil. If both of these are non-bottom, then the corresponding standard 
values could be either Cons or Nil. 

Figure 8.6 contains the function Abs£ 8 - si , which maps standard list values into abstract list values. 
Figure 8.7 contains the least upper bound operator on abstract lists, and Figure 8.8 contains the 
ordering operator for abstract lists. 

This list abstraction is safe, in that it preserves the reachability of the list elements. Abstract 
list representations may suffer from spatial summarization and lose information about whether the 
cons cells are shared. If a list is constructed from distinct invocations of Cons, then the analysis 
will obtain complete information (as with tuples). However, if a list is constructed by a recursive 
procedure, then the calls to Cons will not have distinct activation labels, and a cyclic representation 
will be constructed. 

The compiler must assume that any list whose abstract representation is cyclic may represent a 
cyclic list. The compiler must also assume that the objects pointed to by a cyclic abstract list may 
be shared. Therefore, if the compiler inserts code to deallocate a list, it must ensure that each 
distinct cons cell in the list is deallocated only once. 
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£a\ Cons (sei, 862) ] pcraQ 



£ A { Hd(se) jpcra® 



S A { Tl(se) jpcra® 



e A l l Kil()] pact* 



S A { Nil?(se) jpcra® 



{ vi = S£ A \ sei }p; 
v 2 =S£ A l se 2 }p; 

Vlist = (list (Cons Vi,V 2 ) , -L) ; 

ol = a : /; 

v Ust =(J [ ^ 

a' =a[ol->(v list Uv' list )]; 
in ( {ol},a', 9,-Ldmap) } 

{ ls = S£ A \ sejp; 

(list (fJons Vi,V 2 ) , {mi )) = 

{List {cons -L,-L),-L) U U a[ol]; 

ol^ls 

in ( v 1 ,g^,L D map) } 
{ ls = S£ A \ sejp; 
{List {fJons vi,v 2 ) , {mi )) = 

{List {cons -L,-L),-L) U U a[ol]; 

in ( v 2 ,a, 9,-Ldmap) } 

{ vu. s t = {List -L, {mi )) ; 
0/ = a : /; 

a' =ct[o/-> (siirfU^)]; 
in ( {ol},a', 9,-Ldmap) } 
{ B_,a,$, -Ldmap ) 



ol£ls 



Figure 8.9: Abstracted evaluation of list primitives 



8.3.2 Additions to Abstract Interpreter 

Figure 8.9 contains the evaluation rules for list primitives in the abstract interpreter. They are 
similar to the instrumented evaluation rules, except that labels are abstracted and lists are products 
of Cons and Nil objects. 



8.3.3 Representative List Inputs 



To use our deallocation command safety verification algorithm and deallocation command insertion 
algorithm, we must have a method to construct representative input values for objects of recursive 
types. This section defines the clause of procedure CV that constructs representative list arguments. 

The inputs we pass as inputs to procedures under analysis must be of the correct type and must be 
detectable wherever they are passed within the procedure. This constrains the abstract list values 
that we may use as representative inputs to a function expecting a list. We cannot tell a priori how 
many of the cons cells in a list input are dereferenced by a procedure; consequently, the abstract 
list must either have infinitely many cons cells or be circular. We use circular list representations 
as representative inputs. The clause of procedure CV that constructs representative list values, 
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shown below, returns a circular abstract list whose head contains an element of the correct type. 

CV I List n 1 = {(v,<t') = CV{t 1 1; 

I = Newloc (); 

a" = a'[l —> ( Llst (cons v,{l}} ,(nu}}]', 

in ( {/}, a" > } 

The rationale for the safety of using circular lists as inputs is that any particular list under the 
standard interpreter composed of a finite number of cons cells labeled Iq through l n and containing 
values vq through v n in the heads of each cons can be contained by a cyclic abstract fist. We say 
list representation r$ contains r\ if (ro ^a bsr\). The abstraction of such a list is contained by the 
following abstract value-store pair: 



\ \'0? * i ^nfi -^-Store 



lo — ► (list (fJons V, {/ , • • • , l n }) , (mi )) 



in — ► (list (fJons V, {/ , • • • , l n }) , (mi )) 



where v is the least upper bound of the elements of the concrete list: vq through v n and {/q, • • • , l n } 
is the abstraction of the locations where the concrete list resides. 

Therefore, we can show that the analysis of a function using a circular abstract list representation 
is safe for any list to which the function is actually applied. We show this by substituting the set of 
labels of the Cons cells in the actual list for the label of the Cons in the circular list and substituting 
the least upper bound of the elements of the actual list for the element of the representative list. 

8.3.4 List Examples 

Now let us examine a couple of examples that use lists, in order to see what information we can 
gather and how far we can go with them. Procedure scale_list takes a number and a list of 
numbers and returns a new list of scaled numbers; procedure inc_list takes a number and a list 
of numbers and returns a list of incremented numbers. 

def scale_list (s, 1) = 
if nil?(l) then '"nil 
else { x = hd(l) ; 
xs = tl(l); 
x' = s * x; 

si = fc °scale_list(s, xs); 
r = 'iConsCx' , si); 
in r >; 

def inc_list (delta, 1) = 
if nil?(l) then ' 2 nil 
else { x = hd(l) ; 
xs = tl(l); 
x' = delta + x; 
si = ^incJ-istCs, xs) ; 
r = ' 3 Cons(x' , si); 
in r >; 
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The representative input vector for scale_list is 

( M., {^-l}? -L Stored -1 — ► (list (fJons M., {^-l}) , (mi ))] ) 

This represents a number and a list whose head is a number and whose tail is either nil or the list 
itself. 

The value returned from scale_list is another list, consisting of a cons in location l\ or nil in 
location 1$: 

l-l — ► {List {fJons JV, {/-l}) , {mi )) 
( {^0,^l}, -L Store la — ► {List -L, {mi )) 

h — ► {List {fJons JV, {/ ,/l}) , -L) 

From this value we can determine that the list passed into function scale_list cannot be reached 
from its result. Furthermore, we know that the result is a list that must have been allocated within 
the call to scale_list. 

We obtain similar results when we analyze procedure inc_list. What is the behavior if we compose 
these two functions, as in procedure inc_scale_list? 

def inc_scale_list (delta, s, x) = 
{ x' = fc2 scale_list(s, x) ; 



! inc_list (delta, x'); 



in r >; 



If we evaluate the bindings in the body of inc_scale_list when applied to the following input 
vector: 

( N,N,{1_ 2 }, ±Store[l-2 ^ ( L i,t {fJons K, {/_ 2 }) , ( Nil ))] ) 

we obtain the following values: 

delta — ► N 
s^ N_ 
x -► {/_ 2 } 
x' -> {l ,h} 

"► {List {c 



I 

R 



r -^ 

1-2- 

{1-2} 
{h,h} 



N,{l_ 2 }),{m,))la 
List {fJons JV, {l ,h}) , -L) 

List -L, {mi )) 

List {fJons JV, {l ,h}) , -L) 



{List -L, {mi )) 



The set / of labels reachable from the body of inc_scale_list contains /_ 2 , and R, the set of labels 
reachable from the result of inc_scale_list's body contains / 2 and I3. From this we can conclude 
that the object to which x' is bound, consisting of locations Iq and /1, must have been allocated 
within the body of inc_scale_list and that this object does not escape from there. Consequently, 
we can insert a deallocation command for variable x' . 

What should this deallocation command do? The whole list that has been allocated is garbage, but 
we cannot determine the size of the list from the abstract values. Nor can we determine if there is 
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any sharing in the list. We can insert code to deallocate this whole list as long as the code checks 
that it never deallocates the same cons cell twice along the spine of the list. 

Consider the following example, which constructs a circular list: 

def cyclic_list (elt) = 
{ r = ' 4 Cons(elt, r) ; 
in r >; 

Procedure cyclic_list, applied to JV, returns the value 

( {U}, -i-Store[U — ► {List {c'ons M., {U}) , J-)] ) 

Because the nil component of the list in location I4 is bottom, we can conclude that this list never 
has a null tail — it is either infinite or cyclic. However, we still cannot tell how many cons cells 
the list will have under the standard interpreter. 

Whenever we determine that the lifetime of a list is bounded by some control region, we will insert 
code that recursively deallocates all distinct cons cells of the list upon termination of that control 
region. In Chapter 10, we discuss the run-time performance of the code that deallocates potentially 
cyclic lists compared to the code that deallocates acyclic lists. 



Chapter 9 



Higher-Order Functions 



Many modern programming languages have higher-order functions. That is, one can pass pro- 
cedures around as values. Procedures can take procedures as arguments and return procedures 
as values. This ability to pass procedures around provides a great deal of flexibility in writing 
programs. 

Unfortunately, many approaches to lifetime analysis that use abstract interpretation do not model 
higher-order functions. One of the main difficulties in the abstraction of procedure values is how 
to take the least upper bound of two functions. The least upper bound is well-defined theoretically 
as long as the two functions have the same domains and ranges. If we have functions /o and /i, 
then the least upper bound can be defined as a new function: 

/oU/ 1 = Az.(/o(aOu/ 1 ( a; )) 

However, this definition is not always conducive to an implementation. The key here is to separate 
out the text of the function from the object being passed around as a value. The approach taken 
in this thesis is to represent functions as closures which consist of the name of a function and the 
values the function is closed over. The name of a function points to the text of the function — its 
definition in the program. We allow a prefix of a function's arguments to be provided in a closure 
of the function (a partial application), and the rest must be provided when the closure is applied. 

The second major difficulty in the abstraction of procedure values is that the domain of functions 
of a given type is infinite, and so it is no longer possible to enumerate the input-output behaviors 
of a function that takes procedures as arguments over all possible procedures in the domain. We 
limit the domains of functions to contain only closures of functions that are defined in the program. 
There can only be finitely many functions defined in a program, and only finitely many points 
where those functions are closed over values. 

In this chapter, we see how to add higher-order functions to KID - and to improve the abstraction of 
activation labels. In the first two sections we discuss the implementation of higher-order functions 
in the KID "interpreters. In the final section we present an analysis example using higher-order 
functions. 



9.1 Higher-Order Functions in the Instrumented Interpreter 

In this section, we discuss the changes that need to be made to the domains and interpreters in 
order to add higher-order functions to KID - . We do this for the instrumented and abstracted 
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interpreters. In order to add higher-order functions, we need to add primitives to the language 
to create and apply function values, and we need to add value domains for representing function 
values. 



9.1.1 The Closure Domain 

A higher order function consists of the text of a procedure plus the lexical environment in which 
the function was defined. Higher order functions are most interesting when these functions can be 
defined in lexical environments other than the global environment. Rather than extending KID - 
with nested function definitions; however, we preserve the flat structure of definitions, and introduce 
a primitive that binds together a particular procedure definition and values from the desired lexical 
environment. We represent functions as closures. Closures are a new kind of storable value. 

els eCls = ( cis F,V,---,V) Closure 

sv G SV = Tuple + Array + List + Cls Storable Values 

A closure consists of a tuple of a procedure name /,- and n values. If a function /,- has r arguments, 
then a closure of /,- over n values, where n < r, may be applied to exactly (r — n) values. 

9.1.2 Instrumented Interpretation of Closure Primitives 

We add two primitives to KID - for creating and manipulating closures: 
MakeClosurej 8 which closes the procedure named /,- over some set of argument values, and Apply, 
which applies a closure to a set of values. We are not supporting currying directly with these 
primitives. The compiler can generate a sequence of intermediate functions that use MakeClosure 
and Apply to implement currying. This is described fully in Hochheiser [21]. 

The primitive MakeClosure is subscripted with the name /,- of the function being closed, and takes 
n values over which /,- is being closed. MakeClosure is similar to the MakeTuple primitive in that 
the expression label is used to construct a unique label ol of the structure being allocated. Note 
also that the set of allocation events is augmented to show that ol was allocated in the current 
activation a. 

£i\ MakeClosure^. (sei, •••, se n ) ] pa a = 
{ Vl = S£ { se 1 ] p ; 

v n = S£ { se n ] p ; 

ol = a : /; 

a' =a[ol -► (ch fi,v 1 ,---,v n }]; 

in (ol,a',{(ol, a)},0,0)} 

The primitive Apply takes as its first argument a closure of a function /,- with arity r and n values 
over which the function is closed. There must be (r — n) more values supplied to Apply, so that it can 
make a full- arity application of function /,-. This primitive is similar to user function applications. 
First, we evaluate the arguments and dereference the closure from the incoming store. Then we 
evaluate the body of the closed function /,- in that activation a' and the proper environment, 
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constructed from the values from the closure and from the inputs to Apply. 

Si I fc Apply (se, se n+1 ,- ■ ■ , se r ) ] pa a = 
{ ol = SS I se ] p ; 

v n +i = SS I se n+1 ] p ; 

v r = SS I se r ] p ; 

(ch fi,vi,---,v n ) = cr[ol]; 
a' = a.k; 

{ v,a',A + ,A-,A R ) = Si I e t ] (p[v 1 /x 1 , ■ ■ ■ , v n /x n , v n+1 /x n+1 , ■ ■ ■ , v r /x r ])aa' ; 
A R ' =A R U{(ol,a}}- 1 

in (v,a',A+,A-,A R ') } 
where fi(x\, • • • , x r ) = e 8 - is a definition in the program 

We return a reference event for the closure object ol in activation a. 



9.2 Higher-Order Functions in the Abstract Interpreter 

This section defines the abstract closure domains and the clauses of the abstract interpreter that 
interpret the MakeClosure and Apply primitives. 

9.2.1 Abstracting The Closure Domain 

Abstraction of the closure domain is rather straightforward. We do not attempt to abstract the 
code text of a closure. Rather, we generalize a closure to the set of possible closures that it could be. 
This abstraction fits in nicely with our abstraction of storable values: a reference to the abstraction 
of a closure is a set of abstract object labels, each of which refers to an abstract closure. An 
abstract closure storable value consists of a single function name and a tuple of values over which 
that function is closed. The number of components in the tuple must be less than the number of 
arguments that the function takes. 

v eV = (N + B + Ls)_i_ Denotable values 

els G Cls = ( cis F, V, ■ ■ ■ , V) Abstract Closure 

sv G SV = Tuple + Array + List + Cls Storable Values 

An application of a reference to a set of abstract closures has to return the least upper bound of 
the values returned by applying each of the abstract closures to the abstract argument values. 

In addition to abstracting the closure domain, we must choose a domain of activation labels. We 
have seen two choices for AL so far, the simple one from Chapter 4 and the more detailed one 
from Chapter 6. The more detailed abstraction requires the knowledge of the complete call graph 
in order to define a function MAC that takes an abstract activation label and an expression label 
and returns a new abstraction label. We cannot compute the call-graph of a program that uses 
higher-order functions statically, because the names of the functions that will be invoked by an 
application primitive are not known, in general, until run-time. 
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Abscis({cis fi,Vl,---,V n )) = 

{ci s fi, Abs v (v ), ■ ■ ■ , ABS V (v n )} 



Figure 9.1: Closure abstraction operator 



For programs with higher-order functions, we use a simpler definition of the AL domain and a 
simple function MAC to compute the next activation label. The definition of the domain, shown 
below, is the same as the AL domain used in the standard and instrumented interpreters. 

AL = e | AL.L 

However, the next activation label function guarantees that the set of activation labels remains 
finite. The set of activation labels is finite except for recursive functions; so MAC treats the 
activation labels of recursive function calls specially: 

\f\r( u\ J a ' '^ if a = a 1 -k.fi 
1 a.k otherwise 

The motivation for this definition of MAC is that the activation labels of procedures that are 
called recursively will contain repeated expression labels. Under the standard interpreter, the next 
activation label from a'.k.fl given expression label k would be: 

a = a' .k.fi.k 

so the function MAC limits this to one occurrence of k: 

a = a .k 

by eliding the sequence of expression labels: k.fi. Note that (3 is empty for singly recursive functions, 
and it is non-empty for functions that contain multiple recursive calls or for groups of mutually 
recursive functions. 

An alternative definition of MAC which may be more desirable because it further restricts the size 
of the activation label domain is defined below. 

MAC(a,k) =k 

This definition may yield detailed enough activation labels for most purposes. 

Of course, one could use the original definition of abstract activation labels from Chapter 4, which 
corresponds to the following definition of MAC. 

MAC(a,k) = e 

This definition yields the smallest possible domain of activation labels. 

Figure 9.1 contains the function Absc; s , which maps standard closure values into abstract closure 
values. Figure 9.2 contains the least upper bound operator on abstract closures, and Figure 9.3 
contains the ordering operator for abstract closures. 
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[ds f,V 1 ,---,V n ) U C h (ds 9,W 1 ,---,W n ) = 

f {cis f,(vi U v wi), ■■■,(v n U v w n )) if/ = g 
I T otherwise 



Figure 9.2: Closure least upper bound operators 

(cis f,vi,---,v nf J Qcis { Ch g,w 1 ,---,w ng ) = 
{ f\(v t Qv Wi) if / = g and n f = n g 
[ T otherwise 

Figure 9.3: Closure ordering operators 



9.2.2 Termination of Abstract Interpretation 

The KID~type system guarantees that all closures that are created are of finite depth, where the 
maximum depth can be fixed at compile-time. This fact, plus the fact that there are only a fixed 
number of procedure texts and MakeClosure expressions in a given program, guarantees that there 
can be only a finite number of possible values for any given abstract closure arising during the 
abstract interpretation of a program. Thus, abstract interpretation of a program still takes a finite 
number of iterations to compute the function environment. 

9.2.3 Abstract Interpretation of Closure Primitives 

We also add the clauses to the abstract interpreter for the two primitives that create and manipulate 
closures: MakeClosurej 8 which closes the procedure named /,- over some set of argument values, and 
Apply, which applies a closure to a set of values. The clause for MakeClosure, shown in Figure 9.4, 
uses the static expression label / alone as the object label of the allocated closure. 

The clause for Apply, shown in Figure 9.5, first interprets the first argument to yield a set Is of 
references to abstract closures. The result is the least upper bound of the result of invoking each 
of these closures with the arguments supplied to Apply as well as the values carried in the closures. 
Each abstract closure is invoked by determining the function name /,- and the argument values, 
and then looking up the entry for /,- and those values in the function environment $. In addition, 
these values are added to the domain map A for function /,-. 

9.2.4 Analysis of Higher-Order Programs 

Now that we have seen how to extend the abstract domains and the abstract interpreter in order 
to handle higher-order functions, let us see how this affects analysis of programs containing higher- 
order functions. There are a number of ways this can affect the analysis and transformation of 
such programs. It can cause loss of information, because we have less idea what computation will 
be performed by an expression. It can also cause added complexity in the analysis, because it 
is harder to construct representative input values. But, by exposing a higher-order function as a 
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£ A \ MakeClosurej. (sei, • • •, se n ) ] paa$> = 
{ V! = S£ A { se 1 }p; 

v n = S£ A { se n ] p ; 

Vds = {cis fi,vi,- ■ -,v n ) ; 
ol = a : /; 

a' = a[ol -+ a[ol] U v ds ]; 
in ( {ol},a', 9,-Ldmap) } 

Figure 9.4: Abstract evaluation of the closure constructor 

£ A I fc Apply (se, se n+1 ,---,se r ) ] paa$ = 

{ Is =S£ A { sejp; 

v n +i = S£ A I se n+1 ] p ; 

v r = S£ A { se r ] p ; 

a 1 =MAC(a)k; 

(v',a',A-',A v ')= [J {(ci.fi,v 1 ,---,v n ) = <r[ol]; 
o'eis ( v \ a 1 , A~' ) = 

®[fc][( v i, ■■■,v n , v n+1 ,- --,v r , a, a')]; 
^ V '[fi\ = {(vx,- ■ ■ ,v n ,v n+1 ,- ■ ■ ,v r ,a,a.')}; 
in (v',(t',A-',A v ') } 
m(v',a',A-',AV')} 

where each fi(x\, • • • , x r ) = e 8 - is a definition in the program 

Figure 9.5: Abstract evaluation of closure application 



closure — a data structure — we have enabled the compiler to perform storage management on 
closures themselves. 

We may lose information about the lifetime of an object created within a procedure if it is passed to 
a higher-order function because we may have to make worst case assumptions about the behavior 
of the function passed as an argument. 

In the algorithms described in Chapter 5, we began computation of the function environment by 
computing the value of the application of each function in the program to a representative input 
value. What representative values should we use for functions which take closures as input? What 
values should we use when analyzing the body of a function and verifying or inserting deallocation 
commands? 

During the analysis of a function body, we really do want to pass in some function value that 
captures the behavior of any function that could be passed in at run-time. Either we use the least 
upper bound of all possible values that arise under abstract interpretation, or we make a worst-case 
assumption about the behavior of the function. This value must satisfy the constraints of the type 
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system, but all input values could conceivably be carried in the result of the application. Even 
worse, if I-structures or other side-effecting operations are supported, ah input values (of the right 
type) could be side-effected or stored off in some structure reachable from the input values. 

The approach of constructing representative input values for higher-order inputs to functions only 
pays off it it allows us to manage the storage of closures. Otherwise, we may as well use the least 
upper bound of all possible values that could be passed as input to this function. 

If we look at the whole program, then we can actually determine the types of all the closures created 
in the program (assuming monomorphic typing), and use the set of all closures of the correct type 
as the input value to a function that takes closures as arguments. This process may be equivalent 
to taking the least upper bound of all possible inputs to a function that arise in the abstract 
interpretation described above, and analyzing the function when applied to this least upper bound. 
This process is similar to the behavior of collecting interpreters [24, 44]. 

It seems that it is better to use the most general function value that could ever be passed as input 
to a procedure during the analysis of that procedure than to construct representative closure values. 
We are likely to lose too much information if we use worst-case representative closure values rather 
than the closure values that arise during abstract interpretation. 

9.3 Example of Abstract Interpretation of Higher-Order Func- 
tions 

Let us consider the abstract interpretation of the following program. In the main procedure fo, one 
of two higher-order functions is called depending on the value of predicate p. What is the behavior 
of this program under the abstract interpreter? 

{ 

def f = 
{ p = eO; 
f = if p 

then '°MakeClosure/ oo (10) 
else ^MakeClosure&arCTrue, 3); 
z = el; 

r = k ° Apply (f , z) 
in r >; 

def foo (n,m) = 

' 2 MakeTuple(n,n+m) ; 

def bar (b,n,m) = 

{ x = if b then 3 else 4; 
t = ' 3 MakeTuple(x,n-m); 
in t >; 
> 

We would like to know what the result of the invocation of function f o is under the abstract 
interpreter. The function, or closure, to which variable f is bound is dependent on the value of 
variable p, a run-time value. Therefore, we must abstract the behavior off over all executions. 
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If we evaluate the bindings of the letrec block in the body of f o, we get the following environment 
and store: 



-i-En 



J-Sio 



p - 


> B 




f - 


> {lo,h} 




z — 


> N 




r — 


> {hM . 




/o 


-* (cis f oo, N) 


h 


-+ (c; s bar,5,iV) 


h 


"^ {Tuple K,N) 


h 


-+ (Tuple K,N) 



A" 



We can see by examining p' and a' that f can be bound to a value which is either a closure of f oo 
over a number or a closure of bar over a boolean and a number. In order to obtain the value of 
variable r, the interpreter had to evaluate the application of f oo applied to two numbers and the 
application of bar applied to a boolean and two numbers. 



Chapter 10 



Performance Analysis 



This chapter discusses the performance of an implementation of the analysis and transformations 
described in this thesis. The first section discusses our implementation of the verification and 
insertion algorithms, and Monsoon [36], the machine on which we ran our benchmarks. The second 
section presents the experiments themselves. The third section presents an optimization that 
generates code to allocate structures in procedure activation frames whenever possible and discusses 
how this affects the run-time performance of programs. The fourth section describes the difficulty of 
deallocating structures that may pass through zero-tripping loops, loops that execute zero or more 
times. The fourth section also describes a code generation strategy that can solve this problem. The 
fifth section describes an optimization that hoists matched allocation and deallocation commands 
out of loops in order to reduce the run-time overhead of storage management. 

10.1 Implementation Details 

Most of the theory developed in this thesis has actually been put into practice. We have an im- 
plementation of the abstract interpreter, the deallocation command verification algorithm, and the 
deallocation command insertion algorithm. This section describes the details of our implementation 
and the structure of the experiments we used to determine the overall effectiveness of our methods. 

10.1.1 Implementation of the Verification and Insertion Algorithms 

Our implementation of the deallocation command verification and insertion algorithms handles 
tuples, arrays, algebraic types, lists, and I-structures as well as a number of scalar types: booleans, 
integers, floating point numbers, characters, and symbols. The implementation uses activation 
labels similar to those described in Chapter 6, but higher-order functions are not supported. The 
implementation handles conditionals, loops, and the limited form of barriers shown in this thesis. 
The current implementation of the compiler does not insert conditional deallocation commands, 
but it does attempt to get complete coverage of deallocatable structures using a greedy algorithm 
and a careful ordering of the identifiers whose values may be deallocated. 

The deallocation command verification and insertion tools are implemented as two new modules 
in the Id Compiler [40]. Both modules operate on program graphs, which are basically a dataflow 
graph representation of KID - . The first module computes the function environment for the whole 
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program and verifies and annotates each function definition. The second module walks over the 
program again, and actually inserts deallocation commands and barriers where the first program 
annotated the graph. 

The compiler uses the behavior of each function over its representative input as the behavior of 
that function over any input, as described in Section 5.2. The compiler must compute the behavior 
of all mutually recursive procedures together, but in general computes the entries in the function 
environment in an order determined by a topological sort of the recursive-set nodes in the program. 
A recursive-set node consists of either a function alone, for non-recursive functions, or a function and 
all of the functions it calls recursively, for recursive functions. This allows the compiler to compute 
the function environment for each function /,- before all non-recursive calls to /,-. Computation 
of input-output mappings for each recursive-set in topological order also speeds up analysis by 
making the function environment converge faster. The analysis module takes time proportional to 
the number of recursive-sets and time quadratic in the size of the recursive sets. 

In more detail, the first module computes the call graph of the program. From the call graph, the 
compiler determines the recursive-sets of the program and the order in which function environment 
entries must be computed. The compiler then generates the representative inputs and computes 
the function environment in topological order. 

Next, the compiler visits each procedure and applies first the deallocation command verification 
algorithm and then the insertion algorithm. Any time a potentially unsafe deallocation command 
is found, the compiler issues a warning with as much identification information as possible. 

The insertion algorithm works on one control region at a time. Control regions in the program 
graph correspond to the bodies of procedures, the branches of conditional and case expressions, 
and the code before a barrier. Each of these regions must have been a letrec block in the original 
KID" code. 

Within each control region, the compiler determines all of the output ports (which correspond to 
the definition of an identifier in the letrec block) that will produce structures whose lifetime is 
definitely contained by that of the control region. These ports are then sorted by the size of the 
sets of labels to which they may be bound. Any port whose label set contains another port's label 
is discarded. This process is repeated until we are left with a set of ports whose label sets are 
disjoint. The compiler then inserts deallocation commands on each of these ports. 

Any time the compiler inserts a deallocation command, it informs the programmer where the deal- 
location command was inserted. If the components of a structure can be deallocated, the compiler 
will insert selection and deallocation code for these elements. The compiler has special cases for in- 
serting code to deallocate arrays and their components (shared or unshared) and to deallocate lists 
recursively (cyclic or acyclic). The current implementation does not insert conditional deallocation 
commands. 

In addition to the two modules that implement the deallocation verification and insertion algo- 
rithms, there is a module that generates code to allocate structures in activation frames rather 
than the heap. This module finds structures of static size that are allocated and deallocated within 
the same control region and changes them to be frame allocated. Restricting this module to apply 
only to structures allocated and deallocated within the same control region — rather than within 
the same procedure — limits its usefulness slightly. Nevertheless, this module is fairly effective at 
converting general allocation and deallocation code into frame-based allocation and deallocation 
code. The restriction that the sizes of frame-allocated objects must be known at compile-time is 
imposed by the Id Run Time System (Id-RTS) [41] on the Monsoon dataflow machine [36] which 
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must know the complete size of an activation frame before a procedure is called. We discuss the 
effectiveness of this optimization in Section 10.3. 

10.1.2 Monsoon 

Monsoon [36] is a dataflow machine with an explicit token store. Instead of using a hashing 
function to match the token pairs associated with instruction instances, each instruction has an 
explicit address (relative to an activation or frame pointer) where operand matching occurs. 

All of the experiments described in this chapter were run on a configuration of Monsoon hardware 
consisting of one processor and one I-structure unit. Each monsoon processing element (PE) 
contains 256K 32-bit words of instruction memory, 256K 64-bit words of data memory used for 
activation frames, and 256K element token queues. The processor consists of an eight stage pipeline 
operating at 10 MHz. Eight different threads of computation are interleaved in the pipeline. 

Each I-structure (IS) unit consists of 4M 64-bit words of data memory. Each word of data memory 
on both the PE and IS boards has an associated 3 presence-bits and 8 type-bits. The presence bits 
indicate whether a word of memory is empty or present and are the basic mechanism for fine-grain 
synchronization on Monsoon. The presence bits in activation frames are used for operand matching 
while the presence bits in heap memory are used to implement I-structure semantics. 

Monsoon is heavily instrumented. Each processor has a statistics processor, containing 64 statistics 
registers, that counts on a cycle-by-cycle basis what type of operations were executed and to what 
group of procedures those operations belonged. One of these counters is incremented every cycle. 
The counters are divided into 8 banks of 8 counters. The counter to be incremented is determined 
by the operation type and a 3-bit color field from a executing token's continuation. For most 
operations, the 3-bit color field is used to choose one of the first 7 banks of counters, and the 
operation type is used to choose one of the 8 counters in the chosen bank. Events such as idle 
pipeline cycles are counted in the last bank of 8 counters. 

These statistics counters allow us to measure the utilization of the machine very precisely. We can 
account for how much time is spent in the user's program, how much is spent in the Run- Time 
System (RTS), and how much is spent with the processor idle. We use the statistics counters to 
measure the performance of our examples. 

10.1.3 Id Run-Time System on Monsoon 

The version of the run-time system that we used when running these experiments consisted of a 
frame manager and a heap manager. The frame manager uses a single free list to manage unused 
activation frames, and so it only allocates one size of activation frame. The run-time system is 
initialized so that this frame size is large enough for all procedures. 

The heap manager uses the quick-fit algorithm [43] to manage deallocated storage. This algorithm 
incurs one word of overhead for all objects that are allocated. This overhead is insignificant for large 
objects, but is significant for small objects such as cons cells. Under this management strategy, 
cons cells take three words apiece. 

All structures in Id are implemented as I-structures. Each word of an I-structure has presence-bits 
that indicate whether that word is empty or present. Stores cause the presence-bits of a word to go 
from empty to present as well as changing the value of the word. Fetches issued against an empty 
word defer until a value is stored in that word. 
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One of the duties of the heap manager is to clear the presence-bits of each word of memory to 
empty. The current Id RTS clears the presence-bits of all words of the heap during an initialization 
phase before a program is executed. During program execution, presence-bits are cleared whenever 
an object is deallocated. The heap manager maintains the invariant that all of the presence-bits of 
free memory are empty. 

In steady-state, when as many objects are being deallocated as are allocated, it does not matter 
whether presence-bits are cleared upon allocation or upon deallocation. However, if presence-bits 
are cleared upon deallocation, the difference in run-time between programs that reclaim storage 
and those that never reclaim storage can be significant — programs that do not reclaim storage 
are not charged for clearing the presence-bits of the I-structures that they allocate. Under this 
strategy, a program that does not reclaim storage will have better performance than one that does 
reclaim storage unless it runs out of memory. 

We find that most of our programs that allocate and deallocate approximately equal amounts of 
storage spend about half their time in the run-time system. Of the time spent executing run-time 
system code, half is spent clearing presence-bits, and the other half is spent manipulating the data 
structures that keep track of free and allocated storage. 

The activation frame and heap managers both contain code to record the maximum amount of 
storage that was allocated and the current amount of storage allocated. We use this code to gather 
statistics about the amount of storage used by our example programs. 

10.1.4 Structure of the Experiments 

For each program we studied, we determined storage usage and execution time without storage 
deallocation, and storage usage and execution time with the best hand-inserted deallocation. Then 
we recompiled the programs to verify the hand-inserted deallocation commands, recording the per- 
cent increase in compile-time and the static percentage of deallocation commands verified. We also 
recompiled the original programs to insert deallocation commands automatically, again recording 
the percent increase in compile-time and the static percentage of deallocation commands inserted. 
Finally, we ran the programs again to determine dynamic storage usage and execution time for 
the programs with verified deallocation commands only and automatically inserted deallocation 
commands only. 



10.2 Performance Measurements 

This section describes the compile-time performance of our implementation of the verification and 
insertion algorithms. It also describes the run-time performance of the various versions of the com- 
piled code. The first example described is the Wavefront benchmark. Wavefront is an example we 
use to illustrate the use of non-strictness in the definition of relaxation programs. The second exam- 
ple described is the Simple hydrodynamics benchmark. Both Wavefront and Simple are programs 
with very static structure. Both of these programs use arrays as their major data structure. The 
third example described is the Gamteb benchmark. This example has a more dynamic structure, 
because the heart of the simulation is a set of 7 mutually recursive procedures. Gamteb allocates 
a large number of tuples as it simulates the trajectories of photons in a carbon rod. 
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def multiwave edge_vector n = 
{m = initial_wave edge_vector 
r = 

{for i <- 1 to n do 
next m = wave m ; 
finally m } 
in r }; 



Figure 10.1: The code for multiwave 



def multiwave edge_vector n = 
{m = initial_wave edge_vector ; 
r = 

{for i <- 1 to n do 
next m = wave m ; 

Dealloc(m) ; 
finally m } 

_ = if (1 <= n) then Dealloc(m); 
in r }; 



Figure 10.2: The annotated code for multiwave 



10.2.1 The Wavefront Benchmark 

The Wavefront benchmark is a simple example used to test automatic storage reclamation. The 
outer loop of the example is shown in Figure 10.1. Procedure 

initial_wave allocates a matrix, and each iteration procedure wave reads matrix m and creates a 
new matrix. The matrix passed into each iteration of the loop is garbage upon termination of that 
iteration. The analyzer correctly determines this and allows the compiler to generate the code in 
Figure 10.2. 

We can reclaim the storage associated with the value of initial_wave whenever the loop executes 
at least once. 

The following table contains the compile-times for the Wavefront benchmark. The four versions of 
the program are WavefrontjvA, Wavefront//^, Wavefrontyj? and Wavefront aa- WavefrontjvA is the 
original version, without any deallocation commands. This program was compiled by the unmod- 
ified Id compiler. Wavefront//^ is a hand-annotated version that contains deallocation commands 
that were inserted manually. It was also compiled with the unmodified Id compiler. Wavefrontyj? 
is the hand-annotated version as compiled by the Id compiler with the lifetime analysis and deal- 
location verification module. All unsafe deallocation commands are removed by the compiler. 
Wavefront aa is the unannotated version of the Wavefront program compiled with both the lifetime 
analysis and deallocation insertion modules. The number of deallocation commands is a static 
count of all of the deallocation commands in the program. 
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Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


Wavefront jv a 
Wavefront ha 


18 
18 



3 


Wavefront yj? 
Wavefront a A 


32 
32 


2 
2 



The hand-annotated version contains three deallocation commands to deallocate the edge-vector, 
the first matrix, and each intermediate matrix. The compiler-verified and compiler annotated 
version contain two deallocations: one for the edge vector, and one for the intermediate matrices. 
The compiler cannot determine that the first matrix will not be returned as the result, so it cannot 
insert code to deallocate that matrix. A programmer can insert conditionals to prevent error in 
this case. 

The following table describes the run-time performance of the four versions of Wavefront. Each 
program was run 40 iterations on a 30 X 30 matrix. The table gives the total run-time for each 
program, as well as the maximum amount of storage that was allocated, in words, and the final 
number of words of storage that were still allocated when the programs terminated. 



Program 


Run- Time 
(seconds) 


Max Storage 
(words) 


Final Storage 
(words) 


Wavefront jv a 
Wavefront//^ 


0.193 
0.349 


37,225 
10,000 


37,225 
907 


Wavefront y p 
Wavefront a A 


0.336 

0.375 


10,000 
10,000 


1814 
941 



The original version of this program runs the fastest, but it also uses the most storage. The hand 
annotated version takes 81% longer. However, it deallocates all but the final matrix. The main 
reason the versions containing deallocation code take longer to execute is because the deallocation 
code must clear the presence-bits of the objects being deallocated. 

The compiler- verified and compiler-annotated versions deallocate all but the first and last matrices. 
Deallocation of the first matrix cannot be verified, because if we execute zero iterations, the first 
matrix is returned as the result, and the compiler cannot prove that we execute more than zero 
iterations. We discuss this problem in more detail in Section 10.4. 



10.2.2 Simple 

Simple, a hydrodynamics benchmark program [13], is a scientific program with very simple control 
structure. If compiler-directed storage reclamation is going to have any success, it should be able to 
reclaim every intermediate structure allocated in this program. In fact, our first implementation of 
the program annotator, which did not handle nested structures, had very good success on Simple. 
It inserted Dealloc statements that deallocated seventy percent (dynamically) of the structures 
allocated by the program at run-time. Unfortunately, these were tuples that contained numbers of 
large matrices, and so this was a small fraction — only thirty percent for problem size of ten by 
ten — of the total storage allocated. 

The following table contains the compile-times for the Simple benchmark under four conditions: 
not annotated (JVA), hand annotated (HA), verified safe deallocation commands only (VF) and 
automatically annotated (AA). 
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Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


SimplejvA 
Simple//^ 


409 

437 




70 


Simpleyi? 
Simple^ 


863 
894 


58 
58 



Compilation of the hand-annotated version of Simple took slightly longer than the original version, 
while compilation with the lifetime analyzer and deallocation command verifier or inserter turned 
took twice as long as compilation of the original program. 

The twelve deallocation commands (70 — 58) that could not be verified as safe were all potentially 
unsafe because they deallocated structures that may escape if a loop executed zero iterations. These 
deallocation commands in version HA are actually safe, because the loop never executes fewer than 
one iteration. 

The following table contains information about the run-time performance of the four versions of 
Simple. Each version was run twice: once for 20 iterations of a 50 X 50 matrix, and once for 40 
iterations of a 50 X 50 matrix. 



Program 


Size 


Iters 


Time 


Max Storage 


Final Storage 








(seconds) 


(words) 


(words) 


SimplejvA 


50 


20 


38.9 


1,678,867 


1,678,867 


SimplejvA 


50 


40 


77.0 


3,324,447 


3,324,447 


Simple//^ 


50 


20 


51.5 


114,147 


40,941 


Simple//^ 


50 


40 


102.6 


114,147 


40,941 


Simpleyi? 


50 


20 


51.5 


114,147 


58,609 


Simpleyi? 


50 


40 


102.6 


114,147 


58,609 


Simple^ 


50 


20 


51.6 


114,147 


58,609 


Simple^ 


50 


40 


102.5 


114,147 


58,609 



Each version that contains deallocation commands took about 33% longer to run than the version 
that had no deallocation commands. However, these each deallocated 93% to 97% of the storage 
that they allocated. Each of the three versions containing deallocation commands reclaims all of 
the storage allocated during each iteration. The only difference in the amount of storage that they 
use is in how much of the storage allocated for initial data structures is eventually reclaimed. 

10.2.3 Gamteb 



Gamteb [8], a Monte Carlo simulation of photon transport in a graphite rod, is another scientific 
program on which this system should have good success. The Id version has a slightly more complex 
structure than the original Fortran: the Id version uses a recursive procedure to simulate particle 
transport. This recursive procedure is called from a parallel outer loop. Each recursive procedure 
is called with a new particle and returns a new tuple of counts. The particle tuples passed in can 
be deallocated upon termination of the recursive call, and the count tuples returned as the result 
of the recursive call are read and may be deallocated upon termination of each invocation of the 
outer loop. 

A version of Gamteb with hand-inserted deallocation commands contained 38 deallocation com- 
mands. The compiler verifies the safety of 37 of these deallocation commands. The compiler fails 



168 



CHAPTER 10. PERFORMANCE ANALYSIS 



{ def frame_tuple(x,y) = 

kl { ft = fc2 MakeTuple(x,y) 

result = fc3 g(ft); 
in ki result >; 

def g(t) = 

k H r = fc7 Selecti(t) 
in fc8 r >; 

in fc9 frame_tuple(68,47) > 



Figure 10.3: Frame allocated tuple example 



to verify one deallocation command that reclaims a structure that may be passed through a zero- 
tripping loop. The compiler can insert 35 deallocation commands. It fails to insert two deallocation 
commands that reclaim structures that may be passed through zero-tripping loops. 

The following table contains the compilation times for the four versions of Gamteb. 



Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


GamtebjVA 
Gamteb// ^ 


158 
183 



36 


Gamtebi/p 
Gamteb^ 


976 
980 


34 
34 



The following table contains information about the run-time performance of the four versions of 
Gamteb. 



Program 


N 


Run- Time 


Max Storage 


Final Storage 






(seconds) 


(words) 


(words) 


GamtebjVA 


1000 


10.9 


982,315 


982,315 


GamtebjVA 


2000 


20.3 


1,839,952 


1,839,952 


GamtebfjA 


1000 


18.0 


4710 


132 


GamtebfjA 


2000 


33.6 


5100 


132 


Gamtebi/p 


1000 


17.9 


50000 


48098 


Gamtebi/p 


2000 


33.6 


90000 


89398 


Gamteb^ 


1000 


17.7 


50000 


48098 


Gamteb^ 


2000 


33.1 


90000 


89398 



10.3 Transformation to Frame Allocation 



When the compiler finds a structure that is allocated and deallocated in the same control region, it 
can transform the heap allocation into frame allocation. Deallocation of the structure then happens 
automatically when the procedure exits. In other words, the compiler sets aside enough storage in 
the activation frame of the procedure to contain the structure. In some implementations, such as 
the implementation of Id on Monsoon, this is only possible if the structure size is known statically. 
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{ def frame_tuple(x,y) = 

kl { ft = fc2 MakeFrameTuple(x,y) 
result = fc3 g(ft); 

_ = CleanupFrameTuple(ft) ; 
in ki result >; 

def g(t) = 

k H r = fc7 Selecti(t) 
in fc8 r >; 

in fc9 frame_tuple(68,47) > 



Figure 10.4: Frame allocated tuple example with transformation 



In other implementations, where activations frames are stack allocated, the procedure may be able 
to dynamically allocate space in its activation frame by adjusting its stack pointer. 

Procedure f rame_tuple shown in Figure 10.3 contains a tuple bound to identifier ft that may be 
frame allocated, because the structure allocated by expression ki in procedure f rame_tuple does 
not escape from the invocation of frame_tuple. 

Figure 10.4 contains the transformed code for this example. The primitive 

MakeFrameTuple allocates a tuple in the frame. The semantics of the tuple is exactly the same as for 
a heap-allocated tuple, except that the storage is automatically reclaimed upon termination of the 
procedure frame_tuple. The primitive CleanupFrameTuple performs any cleanup required by the 
run-time system. The Id run-time system requires that all frames be empty when returned, and so 
CleanupFrameTuple clears out the storage used by the tuple. 

The following table summarizes the results when we compile the hand annotated version of Gamteb 
with the frame allocation optimization enabled: 



Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


Gamteb// ^ 
Gamteb/f^j?^ 


183 
183 


38 
38 



The following table contains information about the run-time performance of the Gamteb benchmark 
compiled with the frame allocation optimization enabled. 



Program 


N 


Run- Time 
(seconds) 


Max Storage 
(words) 


Final Storage 
(words) 


GamtebfjA 
Gamteb^f^j?^ 
Gamteb// afa 


1000 
1000 
2000 


18.0 
15.7 
29.4 


4710 
3510 
3500 


132 
132 
132 



The version of Gamteb that uses frame allocation runs 13% faster than the original version, and uses 
less total storage. The optimization itself is very straightforward and does not increase compile-time 
noticeably. 
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10.4 Handling Possibly Zero-Tripping Loops 

A common idiom in functional implementations of scientific programs is a structure that is created 
and then successively refined in a loop or tail recursion. Often, only the final value is needed, and 
the initial value and all intermediate values can be reclaimed. However, if the compiler cannot 
determine that the loop will execute at least once, then it cannot tell that the final value could not 
be the initial value, and the initial value will never be reclaimed by the compiler. 

Here is such an example: 

def multiwave ev k = 
{ M = initial_wave ev; 
in {for i <- 1 to k do 
next M = wave M; 
finally M »; 

The initial value of M, allocated by initial_wave, will be returned as the final value of the loop if 
the value of k is less than one. 

We can provide run-time checking to ensure that the initial matrix is only deallocated if it is not 
returned as the result by testing the initial value of the loop predicate. The following code has this 
transformation. 

def multiwave ev k = 
{ M = initial_wave ev; 

r = {for i <- 1 to k do 
next M = wave M; 
finally M > 



_ = if k > then deallocate M; 
in r >; 

The code after the barrier deallocates the initial copy of M if k is at least one. In Wavefront, 
this optimization only reclaims one object, so it is not very interesting. We applied the same 
optimization with much more spectacular results. 

The following table summarizes the performance of Gamteb when it is compiled with the zero- 
tripping optimization turned on. The compile-time for the row labeled Gamteb^y includes the 
time to perform the zero-tripping optimization. The compile-time of G&mtebzTFA includes both 
the zero-tripping detection and frame-allocation optimizations. 



Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


Gamteb^ 
Gamteb^x 
Gamteb^y^A 


980 
980 
981 


34 
36 
36 



This optimization takes very little time, but allows the compiler to add two more deallocation 
commands to Gamteb than it could without the optimization. These two deallocation commands, as 
we can see from the following table, reduce the storage used by Gamteb considerably. Furthermore, 
once these two deallocation commands have been added, Gamteb uses a constant amount of storage 
for any number of particles simulated. 
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Program 


N 


Run- Time 


Max Storage 


Final Storage 






(seconds) 


(words) 


(words) 


GamtebfjA 


1000 


18.0 


4700 


132 


Gamteb^ 


1000 


17.7 


50000 


48098 


Gamteb^ 


2000 


33.1 


91100 


89398 


Gamteb^x 


1000 


18.0 


4900 


132 


Gamteb^x 


2000 


33.6 


4900 


132 


Gamteb^xj?^ 


1000 


15.6 


3500 


132 


Gamteb^xj?^ 


2000 


29.3 


3700 


132 



These performance results show that the zero-tripping loop optimization is very important, even 
though it only inserts code to deallocate one structure per loop. 

We did similar experiments with Simple to see what difference it made to reclaim the storage from 
structures that may be returned as the result of a loop. The following table shows the compile 
times and the number of deallocations inserted. The ZT version of Simple is compiled with the ZT 
transformation, which inserts twelve additional deallocation commands. 



Program 


Compile- Time 
(seconds) 


Deallocs 
(number) 


Simple^ 
Simple^x 


1100 
1000 


58 
70 



The following table summarizes the results of running the HA, AA, and ZT versions of Simple on 
a 50 X 50 problem size for 20 and 40 iterations. 



Program 


Size 


Iters 


Time 


Max Storage 


Final Storage 








(seconds) 


(words) 


(words) 


Simple//^ 


50 


20 


51.5 


114,147 


40,941 


Simple//^ 


50 


40 


102.6 


114,147 


40,941 


Simple^ 


50 


20 


51.6 


114,147 


58,609 


Simple^ 


50 


40 


102.5 


114,147 


58,609 


Simple^x 


50 


20 


51.6 


114,147 


40,941 


Simple^x 


50 


40 


102.5 


114,147 


40,941 



Use of the ZT transformation allows the compiler-generated deallocation commands to reclaim as 
much storage as the hand-generated deallocation commands do. 



10.5 Examples Using Lists 



This section describes the experiments we did with list manipulating programs. The first example, 
shown below, creates a list named 11 containing len integers. It then creates a list named 12 by 
incrementing each element in 11 by nl. It creates another list /3 by scaling each element in 12 by 
n2. Finally, it returns the sum of the elements of list 13. 

def test len nl n2 = 
{ 11 = gen_list len; 
12 = inc_list nl 11; 
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13 = scale_list n2 12; 
r = sum_list 13; 
in r >; 

The three list generating procedure gen_list, inc_list, and scale_list were written using list 
comprehensions. In Id, a list comprehension is syntactic sugar that expands into a loop expression 
that generates a list. List comprehensions tend to make list manipulating programs more compact. 

The Id compiler inserts code that allocates one extra cons cell for each list comprehension. The 
extra cell simplifies the code that constructs the list, because it eliminates the extra testing that 
would be needed otherwise when generating an empty list. The lifetime of the extra cons cell 
is always bounded by the control region enclosing the list comprehension, but the standard Id 
compiler does not currently insert deallocation code for this extra cell. 

The following table shows the compile-time performance of three versions of this program: no anno- 
tations inserted (N A), hand inserted deallocation commands (HA), and automatically annotated 
(AA). The compiler could not verify any of the hand inserted deallocation commands because they 
are contained in procedures and violate the safety condition that we defined in Chapter 5. The 
compiler has special cases for inserting code to deallocate lists, and these were used to generate the 
automatically annotated version of the benchmark. 



Program 


Compile- Time 
(seconds) 


ft Dealloc 


ft Deallocate_List 


ListjVA 
ListAA 
LisW 


11 
15 
26 




3 



3 
3 



The hand annotated version of the List benchmark contains three calls to the procedure Deallocate_List, 
which deallocates all cells of a list. This procedure assumes that the list is acyclic. The hand an- 
notated version does not deallocate the extra cons cells allocated by the list comprehension code 
because there is no way to name these cells in the Id source code. The compiler annotated version 
of the List benchmark contains three Dealloc commands to reclaim extra cons cells allocated by 
the list comprehension code, as well as three calls to Deallocate_Cyclic_List, which deallocates 
all unique cells in a list. The compiler cannot determine that a list is acyclic, and so it inserts code 
that safely deallocates both cyclic and acyclic lists. 

The following table contains information about the run-time performance of the three versions of 
the list manipulating benchmark. 



Program 


Length 


Run- Time 


Max Storage 


Final Storage 






(seconds) 


(words) 


(words) 


Listjv^ 


1000 


0.194 


9009 


9009 


Listjv^ 


100,000 


19.3 


900,009 


900,009 


Ust DA 


1000 


0.426 


9009 


9 


Ust DA 


100,000 


42.5 


900,009 


9 


LisW 


1000 


0.498 


9006 





LisW 


100,000 


49.3 


900,006 






Both versions of this benchmark that deallocate storage take more than twice as long as the original 
code. The compiler annotated version of this benchmark uses the least amount of storage, but takes 
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the longest because the code to deallocate a potentially cyclic list is more expensive than the code 
to deallocate an acyclic list. The automatically annotated version has a lower maximum storage 
because the deallocation of one of the extra cons cells was allocated before the both of the others 
were allocated. 



10.6 Explicit Storage Reuse 

If the compiler finds a structure that is allocated in each iteration of a loop and deallocated in the 
following iteration, then the compiler can lift both the allocate and the deallocate out of the loop 
and explicitly reuse the structure. In some cases the compiler may have to allocate two or more 
structures outside of the loop and cycle through them. 

Consider the following example, where M is a matrix that is successively relaxed. In each iteration, 
a new version of M is created and an old one becomes garbage. Furthermore, the loop is bounded 
by parameter k — this allows up to k iterations of the loop to execute in parallel. Therefore, the 
space used by the loop should be bounded by k times the space requirements of a single iteration. 

def relax M size n_steps = 

{for i <- 1 to n_steps bound k do 

next M = {matrix ( 1, size) , (1, size) of 
I [i,j] = relax_point M i j 

II i <- 1 to size & j <- 1 to size }; 

Dealloc(M) 
finally M >; 

Although this version of the procedure reclaims all intermediate storage allocated, it calls the heap 
manager n_steps times to allocate storage and n_steps times to deallocate storage. We only ever 
need k instances of the matrix M at any point in time, and so we should be able to locally manage 
the storage in order to reduce the burden on the heap manager. We would like to specialize storage 
management whenever possible to increase the efficiency for particular uses of storage. 

The previous procedure definition can be transformed into the following code in order to reduce 
the overhead of storage management. 

def relax M size n_steps = 

{ Ms = make_k_matrices ((1, size) , (1, size)) k; 
R = {for i <- 1 to n_steps bound k do 
next M = Ms ! [i mod k] ; 
_ = {for i <- 1 to size do 

{for j <- 1 to size do 

next M[i,j] = relax_point M i j }}; 

Ms! [i mod k] = clear .matrix M; 
finally M >; 

_ = free_k_matrices Ms; 
in R >; 
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The procedure make_k_matrices takes the dimensions b of the matrix and the loop bound k and 
returns an M-vector [7, 6] containing k empty matrices each with dimensions b. Each iteration, 
the (i mod k)th element of the vector of empty matrices is taken and used as the value of next 
M. Upon termination of the ith iteration, the current value of M is cleared and put back into the 
(i mod k)th of the vector of empty matrices. The vector of empty matrices and all of the empty 
matrices are deallocated upon termination of the whole loop by the call to f ree_k_matrices. 

This optimization is not currently implemented, but we expect it to be effective in reducing the 
run-time overhead of allocating and deallocating storage. 



Chapter 11 



Conclusion 



We have presented a method for performing object lifetime analysis on non-strict, parallel pro- 
grams. We have shown how to use this lifetime information to verify the correctness of deallocation 
commands in programs and to insert deallocation commands into programs. The central idea of 
this work is recognizing that object lifetimes can be derived from reachability information, and that 
interpreters can determine what objects are reachable from any point in the program. 

The crux of the analysis is the naming of objects. Object names must be related to program 
structure so that dynamic behavior can be related to the static structure of a program. Once we have 
realized that, it is straightforward to derive an abstract interpretation that yields a summarization 
of object reachability. We have presented an operational semantics that derives object names from 
the dynamic structure of a program's call tree. We discussed several abstractions of this naming 
scheme that allow us to model the allocation and connectivity of objects with varying degrees of 
precision. 

The technique of using abstract interpretation to derive an analysis method from the semantics of a 
programming language shows great promise. The lifetime analysis presented in this thesis is precise 
enough to yield great reductions in the usage of storage in many non-trivial scientific applications. 
Our experiments showed that deallocation code inserted by the compiler could reclaim eighty to one 
hundred percent of the storage allocated by a program. While we do not claim that compilers will 
have this level of effectiveness for all programs, we do claim that there is a large class of programs 
for which these methods are very effective. 

11.1 Further Research 

This thesis is by no means the last word in lifetime analysis. We have taken another step by defining 
a lifetime analysis framework for non-strict, parallel languages, but a number of issues remain to 
be investigated. 

11.1.1 Computing Object Lifetimes 

The algorithm that we described to compute function environments in the abstract interpreter 
is very straightforward, but not necessarily very efficient. The process of computing function 
environments needs to be as efficient as possible if abstract interpretation is going to be a practical 
tool. 
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11.1.2 Subscript Analysis 

The interaction of subscript analysis and abstract interpretation is an area that could be explored 
further. Can we do better analysis of programs using arrays if we can determine that certain 
arrays have distinct subregions with potentially different behaviors? For instance, in some scientific 
programs, arrays are created where all of the border elements are shared and all of the inner 
elements are unique. If we could use subscript analysis to distinguish these regions during abstract 
interpretation, we might be able to determine that all of the interior elements could be deallocated 
without having to test for uniqueness. 

11.1.3 Determining Acyclicity of Recursive Objects 

We feel that, by modifying our abstract interpreter, we should be able to perform sharing analysis 
of recursively-typed objects. The goal of this sharing analysis would be to annotate recursively 
typed object representations to indicate whether they form trees, acyclic graphs, or cyclic graphs. 
This information should allow us to distinguish between objects that are definitely trees, objects 
that are definitely acyclic and objects that may be cyclic. This information would be useful because 
the compiler can generate more efficient code to reclaim trees and lists than to reclaim graphs and 
cyclic structures. 

Hendren [20] and Harrison [19] both can determine whether objects are acyclic using information 
about the allocation time of the nodes of a recursively typed object. They used this information 
to determine when statements or subexpressions could be executed in parallel. However, their 
methods depend on having a sequential interpreter, so the methods do not apply to our work. 

The insight we had that allowed us to collect sharing information for the elements of arrays created 
with MakeArray should carry over to recursive objects: the MakeArray construct provides a good 
encapsulation of the expression evaluated to obtain the elements of an array. We can determine if 
sharing is possible by observing the boundary of the encapsulation and seeing if any objects cross 
it, or are inherited. The values that cross the boundary may be shared by the different elements of 
the array. 

Basically, we need to unfold a recursive function once during analysis to determine if the recursive 
calls to the function can share values with the initial call to the function. If there is no sharing 
between the initial call and the recursive calls, then there can be no sharing between any of the 
calls because each of the recursive calls can be considered to be an initial call. Unfortunately, we 
have not seen how to formalize this condition in such a way that it can be included in our lifetime 
analysis method. If we proceed to unfold every recursive call once during abstract interpretation, 
then abstract interpretation will not terminate. Every iteration of the computation of the function 
environment will yield one more input value to which the recursive function must be applied. 

Lent[30] explored the selective unfolding of recursive procedure calls to determine acyclicity of lists. 
He proposed a special mechanism for unfolding function calls one extra time using renamed labels 
and then collapsing the renamed values back into the original domain. This extra level of labels 
should allow us to detect sharing and to annotate the unshared objects, so that we can preserve the 
sharing information once the labels are recompressed. We would like to investigate this technique 
in more detail to determine if it is sound and to extend it beyond detecting acyclic lists to detecting 
tress or directed acyclic graphs. 
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11.1.4 Deallocating Complex Structures 

The problem of generating code to deallocate complex structures is related to the problem of 
determining the acyclicity and sharing of complex structures. The current implementation of the 
deallocation insertion algorithm in the Id compiler has a few special cases for inserting code to 
deallocate single cons cells, potentially cyclic lists, and acyclic lists. The problem of generating 
code to traverse and deallocate recursive objects is still open. The compiler may be able to generate 
a procedure for each type to deallocate complete objects of that type. The compiler could then 
compose these special deallocation procedures to deallocate objects consisting of the composition 
of several types of objects. 

The problem of deallocating nested or recursive structures is exacerbated when the pattern of 
sharing within the structure is complex or unknown. Perhaps the run time system could provide a 
function that recursively descends a structure and deallocates all unique objects in that structure. 

11.1.5 Interaction with Garbage Collection 

Another area that deserves more attention is the interaction of explicit storage management with 
garbage collection. Is it really possible for the two to coexist such that the use of explicit deallocation 
commands decreases the overhead of garbage collection? One approach that we think is worth 
considering is having the compiler generate code to allocate storage in an area separate from the 
garbage collected heap. This code can explicitly deallocate the whole area when the objects in it 
are all dead. 

Another possibility is to have a dynamic storage manager and a garbage collector that coexist in 
one space. Explicit deallocation commands can be used to deallocate storage. Whatever storage is 
not deallocated explicitly will eventually be deallocated by the garbage collector. 

11.1.6 M-Structures 

Full-fledged Id and KID both have M-structures [7, 6], which are useful when writing programs 
that compute histograms, implement graph algorithms, or implement run-time system code. M- 
structures are mutable structures that allow mutually exclusive access to each word. 

We would like to see our instrumented and abstract interpreters augmented to handle programs 
using M-structures. We believe that M-structures can be modeled safely in our abstract interpreter 
in the same fashion as I-Structures. But our solution for modeling abstract M-structures does not 
solve the problem of modeling M-structures in the instrumented interpreter. It seems that the store 
would have to be threaded through the interpreter in order for the interpreter to model mutually 
exclusive access to each M-structure element. We would like to find a solution to this problem that 
does not obscure the parallelism of the interpreter. 

Once M- Structures are added to KID - we will have to model barriers in full generality, which 
may involve computing a graph of activation label precedence. This precedence relation would be 
analogous to our terminates before relation. Once we have computed the precedence graph, we 
may be able to determine in some cases whether programs deadlock. 
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11.2 Other Research Directions 

Other semantic analyses are useful for a wide variety of reasons. Strictness analysis is helpful 
in determining that portions of a program may be sequentialized. Sequentialization is a useful 
optimization for compiling non-strict languages because it eliminates redundant synchronization. 
Dependence analysis and interference analysis are also important analyses in the field of ptimizing 
compilers.. 

The abstract interpretation framework presented in this thesis is a sound basis for a wide variety 
of other such analyses of non-strict or parallel programs. By changing the abstract evaluators and 
value domains presented in this report, the abstract interpreter can be restructured to support 
these other data dependent analysis methods. 
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